\( \newcommand{\Real}{\ensuremath{\mathbb{R}}} \newcommand{\Natural}{\ensuremath{\mathbb{N}}} \newenvironment{rcases} {\left.\begin{aligned}} {\end{aligned}\right\rbrace} \newcommand{\cG}{\ensuremath{\mathcal{G}}} \newcommand{\cE}{\ensuremath{\mathcal{E}}} \newcommand{\cR}{\ensuremath{\mathcal{R}}} \newcommand{\cT}{\ensuremath{\mathcal{T}}} \newcommand{\cS}{\ensuremath{\mathcal{S}}} \newcommand{\cF}{\ensuremath{\mathcal{F}}} \newcommand{\cL}{\ensuremath{\mathcal{L}}} \newcommand{\cX}{\ensuremath{\mathcal{X}}} \newcommand{\cY}{\ensuremath{\mathcal{Y}}} \newcommand{\cQ}{\ensuremath{\mathcal{Q}}} \newcommand{\cD}{\ensuremath{\mathcal{D}}} \newcommand{\cG}{\ensuremath{\mathcal{G}}} \newcommand{\cC}{\ensuremath{\mathcal{C}}} \newcommand{\vnorm}[1]{\ensuremath{\|#1\|}} \newcommand{\emb}[1]{\ensuremath{\mathbf{e}_{#1}}} \newcommand{\rs}{\ensuremath{s}} \newcommand{\rp}{\ensuremath{p}} \newcommand{\ro}{\ensuremath{o}} \newcommand{\vx}{\ensuremath{\mathbf{x}}} \newcommand{\vy}{\ensuremath{\mathbf{y}}} \newcommand{\mW}{\ensuremath{\mathbf{W}}} \newcommand{\RDFTriple}{\ensuremath{\langle \rs, \rp, \ro \rangle}} \newcommand{\BigO}{\ensuremath{\mathcal{O}}} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\term}[1]{\ensuremath{\text{#1}}} \newcommand{\xterm}[1]{\ensuremath{\texttt{#1}}} \newcommand{\card}[1]{\ensuremath{\left\vert{#1}\right\vert}} \newcommand{\rc}{\ensuremath{c}} \newcommand{\domain}{\ensuremath{\term{domain}}} \newcommand{\range}{\ensuremath{\term{range}}} \newcommand{\model}[1]{\textsf{#1}} \newcommand{\scell}[2][c]{% \begin{tabular}[#1]{@{}c@{}}#2\end{tabular}} \newcommand{\comm}[1]{} %\newcommand{\corr}[1]{\ensuremath{\cC(#1)}} \newcommand{\corr}[1]{\ensuremath{\text{corr}(#1)}} \newcommand{\simil}{\ensuremath{\mathrm{sim}}} \newcommand{\spo}{\ensuremath{\rs, \rp, \ro}} \newcommand{\ff}{\ensuremath{f}} \newcommand{\fg}{\ensuremath{g}} \newcommand{\fh}{\ensuremath{h}} \newcommand{\score}{\ensuremath{\text{score}}} \newcommand{\ffrdfs}{\ensuremath{f_{{\text{S}}}}} \newcommand{\Cdot}{\mathrel{\;\cdot\;}} \newcommand{\lambdas}{\ensuremath{\boldsymbol{\lambda}}} \)

Leveraging the Schema
in Latent Factor Models
for Knowledge Graph Completion

Pasquale Minervini$^{1, 2}$, Claudia d'Amato$^{1}$, Nicola Fanizzi$^{1}$, Floriana Esposito$^{1}$
${}^{1}$ Università degli Studi di Bari Aldo Moro, Italy
${}^{2}$ INSIGHT Centre for Data Analytics - NUI Galway (DERI), Ireland
ACM Symposium on Applied Computing - Semantic Web Track (ACM SAC 2016)
http://slides.neuralnoise.com/SAC2016

The Linked Open Data Cloud

Over $1091$ interlinked Knowledge Graphs
$188\times10^6$ triples (facts), describing $8\times10^6$ resources
Inherently incomplete (Open World Assumption)
Can be used for reasoning (Query Answering)

Link Prediction

We assume our Knowledge Graph is a RDF Knowledge Base,
which can be seen as a Labeled Directed Multigraph:

An RDF Knowledge Graph is a tuple $(\cE, \cR, \cG)$:

$\cE$: set of nodes (entities in the domain)
$\cR$: set of edge labels (predicates, relation types)
$\cG \subseteq \cE \times \cR \times \cE$: set of $\langle \text{subject}, \text{predicate}, \text{object} \rangle$ triples representing relationships between entities.

Link Prediction: learn a scoring function $\ff(\cdot)$ in the form: \[ \ff : \cE \times \cR \times \cE \mapsto \Real \] The higher the score, the more likely the triple.

Link Prediction - Example

authorOf
genre
influencedBy

$\cE = \{ \term{Shakespeare}, \term{Hamlet}, \term{Tragedy}, \term{Othello}, \term{Chaucer} \}$
$\cR = \{ \term{authorOf}, \term{genre}, \term{influencedBy} \}$
$\cG = \{ \langle \term{Shakespeare}, \term{authorOf}, \term{Hamlet} \rangle, \langle \term{Hamlet}, \term{genre}, \term{Tragedy} \rangle, \ldots \}$
$\forall t \in \left(\cE \times \cR \times \cE - \cG\right) : \ff(\langle \term{Othello}, \term{genre}, \term{Tragedy}) \geq \ff(t)$

The Translating Embeddings Model

Translating Embeddings is a state-of-the-art Link Prediction method suited for Large and Web-Scale Knowledge Graphs:

Key concepts:

Each entity $x$ is associated to an embedding vector $\emb{x} \in \Real^{k}$
Each predicate $p$ is associated to a translation vector $\emb{p} \in \Real^{k}$
The score of a triple $\RDFTriple$ is given by: \[ \ff(\RDFTriple ; \Theta) = - \vnorm{\left(\emb{\rs} + \emb{\rp}\right) - \emb{\ro}}, \] with $\Theta = \{ \emb{x} \in \Real^{k} \mid x \in \cE \} \cup \{ \emb{p} \in \Real^{k} \mid p \in \cR \}$.

The Translating Embeddings Model

\[ \begin{aligned} & \ff(\langle \text{Shakespeare}, \text{authorOf}, \text{Hamlet} \rangle ; \Theta) \\ & = - \vnorm{\emb{\text{Shakespeare}} + \emb{\text{authorOf}} - \emb{\text{Hamlet}}} \end{aligned} \]

The Learning Process

Entity and predicate embeddings are learned,
by incrementally increasing the score of true facts $t$,
while decreasing the score of unknown facts $u$:

\[ \begin{aligned} &\underset{\Theta}{\text{minimize}} &&\displaystyle{\sum_{t \in G} \sum_{u \in \corr{t}} \max\{0, \gamma - \ff(t ; \Theta) + \ff(u ; \Theta) \}}\\ &\text{subject to} &&\displaystyle{\forall x \in \cE: \; \vnorm{\emb{x}} = 1} \end{aligned} \]
The function $\corr{\cdot}$ is a triple corruption process:
\[ \corr{\RDFTriple} = \{ \langle \tilde{\rs}, \rp, \ro \rangle \mid \tilde{\rs} \in \cE \} \cup \{ \langle \rs, \rp, \tilde{\ro} \rangle \mid \tilde{\ro} \in \cE \}. \]

Schema Knowledge

KGs can be endowed with Schema Knowledge.
We consider RDF Schema, whch provides the constructs:

$\xterm{rdf:type}$ and $\xterm{rdfs:Class}$ - for typing resources.
$\xterm{rdfs:domain}$ - for defining the domain of a predicate.
$\xterm{rdfs:range}$ - for defining the range of a predicate.

RDF Schema (RDFS) also allows for some reasoning, e.g.: \[ \begin{rcases} \langle p, \xterm{rdfs:domain}, c \rangle \\ \langle s, p, o \rangle \end{rcases} \Rightarrow \langle s, \xterm{rdf:type}, c \rangle \\ \begin{rcases} \langle p, \xterm{rdfs:range}, c \rangle \ \\ \langle s, p, o \rangle \end{rcases} \Rightarrow \langle o, \xterm{rdf:type}, c \rangle \]

Leveraging the Schema

In presence of RDF Schema structure, some triples imply new, possibly flawed, type information. For instance:
\[ \begin{aligned} \langle \term{Othello},&&\term{rdf:type},&&\term{LiteraryWork} \rangle\\ \langle \term{England},&&\term{rdf:type},&&\term{Location} \rangle\\ \langle \term{genre},&&\term{rdfs:domain},&&\term{LiteraryWork} \rangle \end{aligned} \]
Consider the problem of adding either $f_{1}$ or $f_{2}$ to the KB: \[ \begin{aligned} f_{1}:&&\langle \term{Othello},&&\term{genre},&&\term{Tragedy} \rangle\\ f_{2}:&&\langle \term{England},&&\term{genre},&&\term{Tragedy} \rangle \end{aligned} \] We have that $f_{2} \Rightarrow \langle \term{England}, \term{rdf:type}, \term{LiteraryWork} \rangle$,
due to RDFS entailment rules (potential modeling flaw).

Redefining the Scoring Function

We introduce two novel regularizers (penalty terms): \[ \begin{aligned} \fg(\rp, \rs) & = \begin{cases} 1 & \text{if } \rs \not\in \domain(\rp), \\ 0 & \text{otherwise.} \end{cases} \\ % \fh(\rp, \ro) & = \begin{cases} 1 & \text{if } \ro \not\in \range(\rp), \\ 0 & \text{otherwise.} \end{cases} \end{aligned} \]
\[ \begin{aligned} \ffrdfs(\RDFTriple ; \Theta, \lambdas) = \ff(\RDFTriple ; \Theta) - \lambda^{\fg}_{\rp} \fg(\rp, \rs) - \lambda^{\fh}_{\rp} \fh(\rp, \ro), \end{aligned} \]
where $\lambdas = \{ \lambda^{\fg}_{\rp}, \lambda^{\fh}_{\rp} \in \Real^{+} \mid \rp \in \cR \}$ are additional
schema-related parameters adaptively decreasing the score.

Learning Schema-Related Parameters

The schema-related parameters $\lambdas$ can be also learned:

\[ \begin{aligned} & \underset{\lambdas}{\text{minimize}} & & \displaystyle{\sum_{t \in G} \sum_{u \in \corr{t}}} \scriptstyle{\max \big\{0, \gamma - \ffrdfs(t ; \Theta, \lambdas) + \ffrdfs(u ; \Theta, \lambdas) \big\} } \\ & \text{subject to} %& & \displaystyle{\forall x \in \cE: \; \vnorm{\emb{x}} = 1} \\ %& & & \displaystyle{\forall \rp \in \cR: \; \lambda^{\fg}_{\rp}, \lambda^{\fh}_{\rp} \geq 0} & & \displaystyle{\forall \rp \in \cR: \; \lambda^{\fg}_{\rp}, \lambda^{\fh}_{\rp} \geq 0} \end{aligned} \]

We use Stochastic Gradient Descent, with learning rates adaptively selected by AdaGrad, for learning $\Theta$ and $\lambdas$.

Experiments

Experiments: Considered Models

Let $\delta(\cdot)$ be a similarity function (neg. distance, dot product):

Translating Embeddings: (Bordes et al. 2013) \[ \ff(\RDFTriple) = \delta(\emb{\rs} + \emb{\rp}, \emb{\ro}) \]
Scaling Embeddings: (Yang et al. 2015) \[ \ff(\RDFTriple) = \delta(\emb{\rs} \circ \emb{\rp}, \emb{\ro}) \] where $\circ$ is the Hadamard (entry-wise) product.
Translating Embeddings$^{+}$: (this paper) \[ \ff(\RDFTriple) = \delta(\emb{\rs} + \emb{\rp_{1}}, \emb{\ro} + \emb{\rp_{2}}) \]
Scaling Embeddings$^{+}$: (this paper) \[ \ff(\RDFTriple) = \delta(\emb{\rs} \circ \emb{\rp_{1}}, \emb{\ro} \circ \emb{\rp_{2}}) \]
Unstructured: baseline (Bordes et al. 2013) \[ \ff(\RDFTriple) = \delta(\emb{\rs}, \emb{\ro}) \]

Experiments: Evaluation Metrics

Following Bordes et al. (2013), for each test triple $\RDFTriple$ we replace the subject $\rs$ with each entity $\rs' \in \cE$, compute the score for $\langle \rs', \rp, \ro \rangle$ and rank all instances by their score in decreasing order. Then we do the same for the object $\ro$.

We report the Mean Rank (the lower the better) and Hits@10 (the higher the better) metrics.

Knowledge Bases:
- Freebase (FB15k): $\approx 483k$ training triples.
- YAGO3: $\approx 1,082k$ training triples.
- DBpedia 2014: $\approx 256k$ training triples.
Hyperparameter selection: grid search on a validation set.

Experiments - Freebase (FB15k)

Freebase fragment provided in Bordes et al. (2013): Freebase is the core of the Google Knowledge Vault project (KDD 2014)

Model	Mean R. - $\ff$	Mean R. - $\ffrdfs$	Hits@10 - $\ff$	Hits@10 - $\ffrdfs$
$\model{Unstr.}$	488	89	13.7	55.0
$\model{TransE}$	86	70	62.3	64.1
$\model{ScalE}$	84	55	65.7	68.2
$\model{TransE}^{+}$	91	75	57.8	59.4
$\model{ScalE}^{+}$	82	61	69.2	71.1

Experiments - YAGO3

YAGO3 Knowledge Base - extends YAGO combining knowledge from multiple language Wikipedias

Model	Mean R. - $\ff$	Mean R. - $\ffrdfs$	Hits@10 - $\ff$	Hits@10 - $\ffrdfs$
$\model{Unstr.}$	4,792	863	10.3	23.6
$\model{TransE}$	1,438	1,127	42.1	42.6
$\model{ScalE}$	2,447	886	45.3	45.7
$\model{TransE}^{+}$	1,446	1,381	39.1	40.0
$\model{ScalE}^{+}$	1,716	869	41.0	41.0

Experiments - DBpedia 2014

DBpedia 2014 fragment extracted as in (Krompaß et al. 2014)

Model	Mean R. - $\ff$	Mean R. - $\ffrdfs$	Hits@10 - $\ff$	Hits@10 - $\ffrdfs$
$\model{Unstr.}$	1,331	745	32.9	43.0
$\model{TransE}$	994	994	50.5	52.1
$\model{ScalE}$	1,149	962	57.4	58.5
$\model{TransE}^{+}$	1,095	1,095	50.1	51.3
$\model{ScalE}^{+}$	1,012	973	55.7	56.0

Conclusions & Future Works

We leverage RDF Schema structure knowledge in latent factor models for link prediction in Knowledge Graphs
We adaptively decrease the prediction score of new triples, depending on whether they imply previously unknown and possibly conflicting type information
Source code and datasets are available online: https://github.com/knowledgegraph/schema
Ongoing research - injecting rules in LFMs.
Interested? pasquale.minervini@insight-centre.org

Leveraging the Schema in Latent Factor Models for Knowledge Graph Completion

The Linked Open Data Cloud

Link Prediction

Link Prediction - Example

The Translating Embeddings Model

The Translating Embeddings Model

The Learning Process

Schema Knowledge

Leveraging the Schema

Redefining the Scoring Function

Learning Schema-Related Parameters

Experiments

Experiments: Considered Models

Experiments: Evaluation Metrics

Experiments - Freebase (FB15k)

Experiments - YAGO3

Experiments - DBpedia 2014

Conclusions & Future Works

Leveraging the Schema
in Latent Factor Models
for Knowledge Graph Completion