Leveraging the Schema
in Latent Factor Models
for Knowledge Graph Completion
Pasquale Minervini$^{1, 2}$, Claudia d'Amato$^{1}$, Nicola Fanizzi$^{1}$, Floriana Esposito$^{1}$
${}^{1}$ Università degli Studi di Bari Aldo Moro, Italy
${}^{2}$ INSIGHT Centre for Data Analytics - NUI Galway (DERI), Ireland
ACM Symposium on Applied Computing - Semantic Web Track (ACM SAC 2016) http://slides.neuralnoise.com/SAC2016
We assume our Knowledge Graph is a RDF Knowledge Base, which can be seen as a Labeled Directed Multigraph:
An RDF Knowledge Graph is a tuple $(\cE, \cR, \cG)$:
$\cE$: set of nodes (entities in the domain)
$\cR$: set of edge labels (predicates, relation types)
$\cG \subseteq \cE \times \cR \times \cE$: set of $\langle \text{subject}, \text{predicate}, \text{object} \rangle$ triples representing relationships between entities.
Link Prediction: learn a scoring function $\ff(\cdot)$ in the form:
\[ \ff : \cE \times \cR \times \cE \mapsto \Real \]
The higher the score, the more likely the triple.
KGs can be endowed with Schema Knowledge.
We consider RDF Schema, whch provides the constructs:
$\xterm{rdf:type}$ and $\xterm{rdfs:Class}$ - for typing resources.
$\xterm{rdfs:domain}$ - for defining the domain of a predicate.
$\xterm{rdfs:range}$ - for defining the range of a predicate.
RDF Schema (RDFS) also allows for some reasoning, e.g.:
\[
\begin{rcases}
\langle p, \xterm{rdfs:domain}, c \rangle \\
\langle s, p, o \rangle
\end{rcases} \Rightarrow \langle s, \xterm{rdf:type}, c \rangle \\
\begin{rcases}
\langle p, \xterm{rdfs:range}, c \rangle \ \\
\langle s, p, o \rangle
\end{rcases} \Rightarrow \langle o, \xterm{rdf:type}, c \rangle
\]
Leveraging the Schema
In presence of RDF Schema structure, some
triples imply new, possibly flawed, type information.
For instance:
\[
\begin{aligned}
\langle \term{Othello},&&\term{rdf:type},&&\term{LiteraryWork} \rangle\\
\langle \term{England},&&\term{rdf:type},&&\term{Location} \rangle\\
\langle \term{genre},&&\term{rdfs:domain},&&\term{LiteraryWork} \rangle
\end{aligned}
\]
Consider the problem of adding either $f_{1}$ or $f_{2}$ to the KB:
\[
\begin{aligned}
f_{1}:&&\langle \term{Othello},&&\term{genre},&&\term{Tragedy} \rangle\\
f_{2}:&&\langle \term{England},&&\term{genre},&&\term{Tragedy} \rangle
\end{aligned}
\]
We have that $f_{2} \Rightarrow \langle \term{England}, \term{rdf:type}, \term{LiteraryWork} \rangle$,
due to RDFS entailment rules (potential modeling flaw).
Scaling Embeddings: (Yang et al. 2015)
\[ \ff(\RDFTriple) = \delta(\emb{\rs} \circ \emb{\rp}, \emb{\ro}) \]
where $\circ$ is the Hadamard (entry-wise) product.
Following Bordes et al. (2013),
for each test triple $\RDFTriple$ we replace the subject
$\rs$ with each entity $\rs' \in \cE$, compute the score
for $\langle \rs', \rp, \ro \rangle$ and rank all
instances by their score in decreasing order. Then we do
the same for the object $\ro$.
We report the Mean Rank (the lower the better)
and Hits@10 (the higher the better) metrics.
Knowledge Bases:
Freebase (FB15k): $\approx 483k$ training triples.
YAGO3: $\approx 1,082k$ training triples.
DBpedia 2014: $\approx 256k$ training triples.
Hyperparameter selection: grid search on a validation set.
Experiments - Freebase (FB15k)
Freebase fragment provided in Bordes et al. (2013): Freebase is the core of the Google Knowledge Vault project (KDD 2014)
Model
Mean R. - $\ff$
Mean R. - $\ffrdfs$
Hits@10 - $\ff$
Hits@10 - $\ffrdfs$
$\model{Unstr.}$
488
89
13.7
55.0
$\model{TransE}$
86
70
62.3
64.1
$\model{ScalE}$
84
55
65.7
68.2
$\model{TransE}^{+}$
91
75
57.8
59.4
$\model{ScalE}^{+}$
82
61
69.2
71.1
Experiments - YAGO3
YAGO3 Knowledge Base - extends YAGO combining knowledge from multiple language Wikipedias
Model
Mean R. - $\ff$
Mean R. - $\ffrdfs$
Hits@10 - $\ff$
Hits@10 - $\ffrdfs$
$\model{Unstr.}$
4,792
863
10.3
23.6
$\model{TransE}$
1,438
1,127
42.1
42.6
$\model{ScalE}$
2,447
886
45.3
45.7
$\model{TransE}^{+}$
1,446
1,381
39.1
40.0
$\model{ScalE}^{+}$
1,716
869
41.0
41.0
Experiments - DBpedia 2014
DBpedia 2014 fragment extracted as in (Krompaß et al. 2014)
Model
Mean R. - $\ff$
Mean R. - $\ffrdfs$
Hits@10 - $\ff$
Hits@10 - $\ffrdfs$
$\model{Unstr.}$
1,331
745
32.9
43.0
$\model{TransE}$
994
994
50.5
52.1
$\model{ScalE}$
1,149
962
57.4
58.5
$\model{TransE}^{+}$
1,095
1,095
50.1
51.3
$\model{ScalE}^{+}$
1,012
973
55.7
56.0
Conclusions & Future Works
We leverage RDF Schema structure knowledge
in latent factor models for link prediction in Knowledge Graphs
We adaptively decrease the prediction score of
new triples, depending on whether they imply previously unknown
and possibly conflicting type information