\( \newcommand{\Real}{\ensuremath{\mathbb{R}}} \newcommand{\Natural}{\ensuremath{\mathbb{N}}} \newenvironment{rcases} {\left.\begin{aligned}} {\end{aligned}\right\rbrace} \newcommand{\cG}{\ensuremath{\mathcal{G}}} \newcommand{\cE}{\ensuremath{\mathcal{E}}} \newcommand{\cR}{\ensuremath{\mathcal{R}}} \newcommand{\cT}{\ensuremath{\mathcal{T}}} \newcommand{\cS}{\ensuremath{\mathcal{S}}} \newcommand{\cF}{\ensuremath{\mathcal{F}}} \newcommand{\cL}{\ensuremath{\mathcal{L}}} \newcommand{\cX}{\ensuremath{\mathcal{X}}} \newcommand{\cY}{\ensuremath{\mathcal{Y}}} \newcommand{\cQ}{\ensuremath{\mathcal{Q}}} \newcommand{\cD}{\ensuremath{\mathcal{D}}} \newcommand{\cG}{\ensuremath{\mathcal{G}}} \newcommand{\cC}{\ensuremath{\mathcal{C}}} \newcommand{\vnorm}[1]{\ensuremath{\|#1\|}} \newcommand{\emb}[1]{\ensuremath{\mathbf{e}_{#1}}} \newcommand{\rs}{\ensuremath{s}} \newcommand{\rp}{\ensuremath{p}} \newcommand{\ro}{\ensuremath{o}} \newcommand{\vx}{\ensuremath{\mathbf{x}}} \newcommand{\vy}{\ensuremath{\mathbf{y}}} \newcommand{\mW}{\ensuremath{\mathbf{W}}} \newcommand{\RDFTriple}{\ensuremath{\langle \rs, \rp, \ro \rangle}} \newcommand{\BigO}{\ensuremath{\mathcal{O}}} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\term}[1]{\ensuremath{\text{#1}}} \newcommand{\xterm}[1]{\ensuremath{\texttt{#1}}} \newcommand{\card}[1]{\ensuremath{\left\vert{#1}\right\vert}} \newcommand{\rc}{\ensuremath{c}} \newcommand{\domain}{\ensuremath{\term{domain}}} \newcommand{\range}{\ensuremath{\term{range}}} \newcommand{\model}[1]{\textsf{#1}} \newcommand{\scell}[2][c]{% \begin{tabular}[#1]{@{}c@{}}#2\end{tabular}} \newcommand{\comm}[1]{} %\newcommand{\corr}[1]{\ensuremath{\cC(#1)}} \newcommand{\corr}[1]{\ensuremath{\text{corr}(#1)}} \newcommand{\simil}{\ensuremath{\mathrm{sim}}} \newcommand{\spo}{\ensuremath{\rs, \rp, \ro}} \newcommand{\ff}{\ensuremath{f}} \newcommand{\fg}{\ensuremath{g}} \newcommand{\fh}{\ensuremath{h}} \newcommand{\score}{\ensuremath{\text{score}}} \newcommand{\ffrdfs}{\ensuremath{f_{{\text{S}}}}} \newcommand{\Cdot}{\mathrel{\;\cdot\;}} \newcommand{\lambdas}{\ensuremath{\boldsymbol{\lambda}}} \)

Leveraging the Schema
in Latent Factor Models
for Knowledge Graph Completion

Pasquale Minervini$^{1, 2}$, Claudia d'Amato$^{1}$, Nicola Fanizzi$^{1}$, Floriana Esposito$^{1}$
${}^{1}$ Università degli Studi di Bari Aldo Moro, Italy
${}^{2}$ INSIGHT Centre for Data Analytics - NUI Galway (DERI), Ireland
ACM Symposium on Applied Computing - Semantic Web Track (ACM SAC 2016)
http://slides.neuralnoise.com/SAC2016

The Linked Open Data Cloud

  • Over $1091$ interlinked Knowledge Graphs
  • $188\times10^6$ triples (facts), describing $8\times10^6$ resources
  • Inherently incomplete (Open World Assumption)
  • Can be used for reasoning (Query Answering)

Link Prediction

We assume our Knowledge Graph is a RDF Knowledge Base,
which can be seen as a Labeled Directed Multigraph:
An RDF Knowledge Graph is a tuple $(\cE, \cR, \cG)$:
  • $\cE$: set of nodes (entities in the domain)
  • $\cR$: set of edge labels (predicates, relation types)
  • $\cG \subseteq \cE \times \cR \times \cE$: set of $\langle \text{subject}, \text{predicate}, \text{object} \rangle$ triples representing relationships between entities.


Link Prediction: learn a scoring function $\ff(\cdot)$ in the form: \[ \ff : \cE \times \cR \times \cE \mapsto \Real \] The higher the score, the more likely the triple.

Link Prediction - Example

  • authorOf
  • genre
  • influencedBy
  • $\cE = \{ \term{Shakespeare}, \term{Hamlet}, \term{Tragedy}, \term{Othello}, \term{Chaucer} \}$
  • $\cR = \{ \term{authorOf}, \term{genre}, \term{influencedBy} \}$
  • $\cG = \{ \langle \term{Shakespeare}, \term{authorOf}, \term{Hamlet} \rangle, \langle \term{Hamlet}, \term{genre}, \term{Tragedy} \rangle, \ldots \}$
  • $\forall t \in \left(\cE \times \cR \times \cE - \cG\right) : \ff(\langle \term{Othello}, \term{genre}, \term{Tragedy}) \geq \ff(t)$

The Translating Embeddings Model

Translating Embeddings is a state-of-the-art Link Prediction method suited for Large and Web-Scale Knowledge Graphs:

Key concepts:
  • Each entity $x$ is associated to an embedding vector $\emb{x} \in \Real^{k}$
  • Each predicate $p$ is associated to a translation vector $\emb{p} \in \Real^{k}$
  • The score of a triple $\RDFTriple$ is given by: \[ \ff(\RDFTriple ; \Theta) = - \vnorm{\left(\emb{\rs} + \emb{\rp}\right) - \emb{\ro}}, \] with $\Theta = \{ \emb{x} \in \Real^{k} \mid x \in \cE \} \cup \{ \emb{p} \in \Real^{k} \mid p \in \cR \}$.

The Translating Embeddings Model

\[ \begin{aligned} & \ff(\langle \text{Shakespeare}, \text{authorOf}, \text{Hamlet} \rangle ; \Theta) \\ & = - \vnorm{\emb{\text{Shakespeare}} + \emb{\text{authorOf}} - \emb{\text{Hamlet}}} \end{aligned} \]

The Learning Process

Entity and predicate embeddings are learned,
by incrementally increasing the score of true facts $t$,
while decreasing the score of unknown facts $u$:

\[ \begin{aligned} &\underset{\Theta}{\text{minimize}} &&\displaystyle{\sum_{t \in G} \sum_{u \in \corr{t}} \max\{0, \gamma - \ff(t ; \Theta) + \ff(u ; \Theta) \}}\\ &\text{subject to} &&\displaystyle{\forall x \in \cE: \; \vnorm{\emb{x}} = 1} \end{aligned} \]
The function $\corr{\cdot}$ is a triple corruption process:
\[ \corr{\RDFTriple} = \{ \langle \tilde{\rs}, \rp, \ro \rangle \mid \tilde{\rs} \in \cE \} \cup \{ \langle \rs, \rp, \tilde{\ro} \rangle \mid \tilde{\ro} \in \cE \}. \]

Schema Knowledge

KGs can be endowed with Schema Knowledge.
We consider RDF Schema, whch provides the constructs:
  • $\xterm{rdf:type}$ and $\xterm{rdfs:Class}$ - for typing resources.
  • $\xterm{rdfs:domain}$ - for defining the domain of a predicate.
  • $\xterm{rdfs:range}$ - for defining the range of a predicate.

RDF Schema (RDFS) also allows for some reasoning, e.g.: \[ \begin{rcases} \langle p, \xterm{rdfs:domain}, c \rangle \\ \langle s, p, o \rangle \end{rcases} \Rightarrow \langle s, \xterm{rdf:type}, c \rangle \\ \begin{rcases} \langle p, \xterm{rdfs:range}, c \rangle \ \\ \langle s, p, o \rangle \end{rcases} \Rightarrow \langle o, \xterm{rdf:type}, c \rangle \]

Leveraging the Schema

In presence of RDF Schema structure, some triples imply new, possibly flawed, type information. For instance:
\[ \begin{aligned} \langle \term{Othello},&&\term{rdf:type},&&\term{LiteraryWork} \rangle\\ \langle \term{England},&&\term{rdf:type},&&\term{Location} \rangle\\ \langle \term{genre},&&\term{rdfs:domain},&&\term{LiteraryWork} \rangle \end{aligned} \]
Consider the problem of adding either $f_{1}$ or $f_{2}$ to the KB: \[ \begin{aligned} f_{1}:&&\langle \term{Othello},&&\term{genre},&&\term{Tragedy} \rangle\\ f_{2}:&&\langle \term{England},&&\term{genre},&&\term{Tragedy} \rangle \end{aligned} \] We have that $f_{2} \Rightarrow \langle \term{England}, \term{rdf:type}, \term{LiteraryWork} \rangle$,
due to RDFS entailment rules (potential modeling flaw).

Redefining the Scoring Function

We introduce two novel regularizers (penalty terms): \[ \begin{aligned} \fg(\rp, \rs) & = \begin{cases} 1 & \text{if } \rs \not\in \domain(\rp), \\ 0 & \text{otherwise.} \end{cases} \\ % \fh(\rp, \ro) & = \begin{cases} 1 & \text{if } \ro \not\in \range(\rp), \\ 0 & \text{otherwise.} \end{cases} \end{aligned} \]
\[ \begin{aligned} \ffrdfs(\RDFTriple ; \Theta, \lambdas) = \ff(\RDFTriple ; \Theta) - \lambda^{\fg}_{\rp} \fg(\rp, \rs) - \lambda^{\fh}_{\rp} \fh(\rp, \ro), \end{aligned} \]
where $\lambdas = \{ \lambda^{\fg}_{\rp}, \lambda^{\fh}_{\rp} \in \Real^{+} \mid \rp \in \cR \}$ are additional
schema-related parameters adaptively decreasing the score.

Learning Schema-Related Parameters

The schema-related parameters $\lambdas$ can be also learned:

\[ \begin{aligned} & \underset{\lambdas}{\text{minimize}} & & \displaystyle{\sum_{t \in G} \sum_{u \in \corr{t}}} \scriptstyle{\max \big\{0, \gamma - \ffrdfs(t ; \Theta, \lambdas) + \ffrdfs(u ; \Theta, \lambdas) \big\} } \\ & \text{subject to} %& & \displaystyle{\forall x \in \cE: \; \vnorm{\emb{x}} = 1} \\ %& & & \displaystyle{\forall \rp \in \cR: \; \lambda^{\fg}_{\rp}, \lambda^{\fh}_{\rp} \geq 0} & & \displaystyle{\forall \rp \in \cR: \; \lambda^{\fg}_{\rp}, \lambda^{\fh}_{\rp} \geq 0} \end{aligned} \]

We use Stochastic Gradient Descent, with learning rates adaptively selected by AdaGrad, for learning $\Theta$ and $\lambdas$.

Experiments

Experiments: Considered Models

Let $\delta(\cdot)$ be a similarity function (neg. distance, dot product):

  • Translating Embeddings: (Bordes et al. 2013) \[ \ff(\RDFTriple) = \delta(\emb{\rs} + \emb{\rp}, \emb{\ro}) \]
  • Scaling Embeddings: (Yang et al. 2015) \[ \ff(\RDFTriple) = \delta(\emb{\rs} \circ \emb{\rp}, \emb{\ro}) \] where $\circ$ is the Hadamard (entry-wise) product.
  • Translating Embeddings$^{+}$: (this paper) \[ \ff(\RDFTriple) = \delta(\emb{\rs} + \emb{\rp_{1}}, \emb{\ro} + \emb{\rp_{2}}) \]
  • Scaling Embeddings$^{+}$: (this paper) \[ \ff(\RDFTriple) = \delta(\emb{\rs} \circ \emb{\rp_{1}}, \emb{\ro} \circ \emb{\rp_{2}}) \]
  • Unstructured: baseline (Bordes et al. 2013) \[ \ff(\RDFTriple) = \delta(\emb{\rs}, \emb{\ro}) \]

Experiments: Evaluation Metrics

  • Following Bordes et al. (2013), for each test triple $\RDFTriple$ we replace the subject $\rs$ with each entity $\rs' \in \cE$, compute the score for $\langle \rs', \rp, \ro \rangle$ and rank all instances by their score in decreasing order. Then we do the same for the object $\ro$.

  • We report the Mean Rank (the lower the better) and Hits@10 (the higher the better) metrics.

  • Knowledge Bases:
    • Freebase (FB15k): $\approx 483k$ training triples.
    • YAGO3: $\approx 1,082k$ training triples.
    • DBpedia 2014: $\approx 256k$ training triples.
  • Hyperparameter selection: grid search on a validation set.

Experiments - Freebase (FB15k)


Freebase fragment provided in Bordes et al. (2013): Freebase is the core of the Google Knowledge Vault project (KDD 2014)

Model Mean R. - $\ff$ Mean R. - $\ffrdfs$ Hits@10 - $\ff$ Hits@10 - $\ffrdfs$
$\model{Unstr.}$ 488 89 13.7 55.0
$\model{TransE}$ 86 70 62.3 64.1
$\model{ScalE}$ 84 55 65.7 68.2
$\model{TransE}^{+}$ 91 75 57.8 59.4
$\model{ScalE}^{+}$ 82 61 69.2 71.1

Experiments - YAGO3


YAGO3 Knowledge Base - extends YAGO combining knowledge from multiple language Wikipedias

Model Mean R. - $\ff$ Mean R. - $\ffrdfs$ Hits@10 - $\ff$ Hits@10 - $\ffrdfs$
$\model{Unstr.}$ 4,792 863 10.3 23.6
$\model{TransE}$ 1,438 1,127 42.1 42.6
$\model{ScalE}$ 2,447 886 45.3 45.7
$\model{TransE}^{+}$ 1,446 1,381 39.1 40.0
$\model{ScalE}^{+}$ 1,716 869 41.0 41.0

Experiments - DBpedia 2014


DBpedia 2014 fragment extracted as in (Krompaß et al. 2014)

Model Mean R. - $\ff$ Mean R. - $\ffrdfs$ Hits@10 - $\ff$ Hits@10 - $\ffrdfs$
$\model{Unstr.}$ 1,331 745 32.9 43.0
$\model{TransE}$ 994 994 50.5 52.1
$\model{ScalE}$ 1,149 962 57.4 58.5
$\model{TransE}^{+}$ 1,095 1,095 50.1 51.3
$\model{ScalE}^{+}$ 1,012 973 55.7 56.0

Conclusions & Future Works


  • We leverage RDF Schema structure knowledge in latent factor models for link prediction in Knowledge Graphs

  • We adaptively decrease the prediction score of new triples, depending on whether they imply previously unknown and possibly conflicting type information

  • Source code and datasets are available online: https://github.com/knowledgegraph/schema

  • Ongoing research - injecting rules in LFMs.
    Interested? pasquale.minervini@insight-centre.org