Adversarial Sets for Regularising Neural Link Predictors
Pasquale Minervini$^{\alpha}$ Thomas Demeester$^{\beta}$ Tim Rocktäschel$^{\gamma}$ Sebastian Riedel$^{\alpha}$
${}^{\alpha}$ University College London ${}^{\beta}$ Ghent University - iMinds ${}^{\gamma}$ University of Oxford
p.minervini@ucl.ac.uk
http://slides.neuralnoise.com/CPH2017
subject | predicate | object |
---|---|---|
Barack Obama | married to | Michelle Obama |
Barack Obama | politician of | United States |
Barack Obama | born in | Honolulu |
Honolulu | capital of | Hawaii |
Many great knowledge graphs, including DBpedia, Freebase, YAGO, the Google Knowledge Graph and the Facebook Knowledge Graph.
However, they suffer from incompleteness:
Link Prediction: problem of finding missing links in Knowledge Graphs.
Link Prediction can be cast as a learning to rank problem, where the aim is learning a scoring function $\ff : \cE \times \cR \times \cE \mapsto \Real$ such that:
\[ \ff(\rs, \rp, \ro) \;\; \text{is} \;\; \left\{ \begin{array}{rl} \text{large} & \text{if the fact } \spo \text{ is true},\\ \text{small} & \text{otherwise} \end{array} \right. \]The scoring function $\ff$ can be used for ranking missing triples: the higher the score, the more likely the triple.
The scoring function $\ff$ can be based either on the observable features or the latent features of entities $\rs, \ro \in \cE$.
Neural Link Predictors are neural network architectures where:
Several Neural Link Predictors proposed in the literature:
Minimizing a hinge loss over observed and corrupted triples:
Where negative examples ($y = -1$) are generated by a corruption process, that corrupt either the subject or the object of a triple:
Consider the following rule:
\[ \boxed{\sc{fatherOf}(\mathcal{X}, \mathcal{Y}) \land \sc{parentOf}(\mathcal{Y}, \mathcal{Z}) \Rightarrow \sc{grandFatherOf}(\mathcal{X}, \mathcal{Z})} \]
We want to enforce that, for all entities $x, y, z \in \cE$,
if $\langle x, \sc{fatherOf}, y \rangle$ and $\langle y, \sc{parentOf}, z \rangle$ hold,
$\langle x, \sc{grandFatherOf}, z \rangle$ also holds:
A naïve solution consists in adding, for each rule, $\boxed{\card{\cE}^{3}}$ constraints.
However, this is unlikely to scale on large knowledge graphs -- e.g. on Freebase (40M entities) need to introduce $6.4 \times 10^{22}$ constraints.
Idea - Adversarial Training setting where, iteratively:
Idea - Adversarial Training setting where, iteratively:
This can be formalised in the following optimisation problem:
\[
\min_{\params} \mathcal{J}(\params) + \lambda \boxed{ \max_{\mathcal{S}} \mathcal{J}_{I}(\mathcal{S} ; \mathcal{A}, \params) }
\]
where $\mathcal{J}_{I}({}\cdot{})$ measures to which degree the entities in $\mathcal{S}$
violate the rule in $\mathcal{A}$, given the model parameters $\params$.
Rank of a triple: given a test triple $\spo$, corrupt its subject
$\rs$ in every possible way, and sort the triples by their score in descending order.
The position of $\spo$ in the ranking is its rank.
Then do the same for its object $\ro$.
We evaluate by:
Datasets:Freebase (FB122), WordNet (WN18), and others.
Small sample of the rules being used for
Freebase (FB122) and WordNet (WN18)
We compared our method (ASR)
and its closed form solutions (cASR)
with $\sc{KALE}$, a neural link predictor leveraging FOL rules,
and the unregularized versions of $\sc{DistMult}$ and $\sc{ComplEx}$.
Adversarial Set Regularisatin is especially useful in a small-data regime,
where other methods struggle to generalise correctly.
We proposed Adversarial Set Regularisation, a method for
injecting prior knowledge in Neural Link Predictors where:
Beyond Neural Link Prediction - Look at other domains, where it can be beneficial to impose constraint on multiple data points:
Interested in applying this idea to other domains? Contact me!
$\sc{interestedIn}(\mathcal{X}, \;$ ASR $) \Rightarrow \sc{mailTo}(\mathcal{X}, \;$ p.minervini@ucl.ac.uk$)$