Statistical Resoning
Statistical Reasoning tries to find suitable statistical models to fit the samples and predicts the expected probabilities of the inferred knowledge. 预测未来知识出现的概率
knowledge graph embedding based reasoning
- inductive rule learning based reasoning
- multi-hop reasoning
- Predicting the missing link.
- Given e1 and e2, predict the relation r.
- Predicting the missing entity.
- Given e1 (e2)and relation r, predict the missing entity e2 (e1).
- Fact Prediction.
- Given a triple, predict whether it is true or false.
2. Embedding: Meaning of a Word
- What is the meaning of a word?
- By ontologies? By Knowledge Graph?
- But ontologies and KGs are hard to construct and often incomplete 无法穷举
- How to encode the meaning of a word?
3. One-hot Representation
- Vocabulary: (cat, mat, on, sat, the)
- cat: 10000 mat: 01000 on: 00100 sat: 00010 the: 00001
- “The cat sat on the mat”
- Disadvantage: too sparsity
- One-hot representation:
- Foundation of Bag-of-words Model
- 无法衡量语义相关度
4. Distributional Representation
When a word w appears in text, its context is the set of words that appear nearby (within a fixed-size window): 用中心词周围的词表示该词
Use many contexts of w to build up a representation of w
- 建立一个稠密向量
5. Word Vectors
- We will build a dense vector for each word, chosen so that it is similar to vectors of words that appear in similar contexts.
- Note: word vectors are sometimes called word embeddings. They are a distributed representation.
6. Advantage of Distributed Representation
- Deal with data sparsity problem in NLP
- Realize knowledge transfer across domains and across objects
- Provide a unified representation for multi-task learning
6.1 Representation Learning
- What is the representation learning?
- Objects are represented as dense, real-value and low-dimensional vector
6.2 Different ways of KG Representation
- Tensor: 自由度更高,隐式知识,但不容易扩展,不容易解释
6.3 Knowledge Graph Embedding: Application
- Entity Prediction
- 卧虎藏龙 Has-director ?
- 卧虎藏龙 Has-director:Ang Lee
- Relation Prediction
- Recommendation System
7. TransE: Take Relation as Translation
- For a fact (head, relation, tail), take the relation as a translation operator from the head to the tail .
- 实体经过关系的翻译到另一个实体
- For each triple
, h is translated to t by r.
- Train TransE Energy Function:
If the triple is true, the translated distance between (h + r) and t is shorter.
L1 (Manhattan) distance:
- L2 (Euclidean) distance:
- Triple1:
- Triple2:
- Triple3:
- …
- false triple examples:
- …
How to distinguish?(true and false)
- Minimize the distance between (h+l) and t.
- Maximize the distance between (h’+l) to a randomly sampled tail t’ (negative example).
- 最小化正类表示的差距,最大化负类表示的差距
- Tbatch就是一个正例和负例元组的集合
- input Training set $S=\{(h, \ell, t)\}$, entities and relations. sets $E$ and $L$, margin $\lambda$, embeddings dim. $k$.
- Initialize entity and relationship embedding;
- Entity and relationship embedding normalization;
For each entity e(Suppose there are M elements in the entity set E)
4、Negative Sampling
- Evaluation protocol:
Metrics: 遍历所有实例,进行距离计算,并排序
Link Prediction
( WALL-E , _has_genre , ? )Mean Ranks: the mean of those predicted ranks.
- Hits@10: the proportion of correct entities ranked in the top 10.
e.g. Entity 1: rank -> 50; Entity 2: rank -> 100; MR = (50+100)/2 = 75
We have two types of relations in KG, for example:
Symmetric Relation:
- e.g., (stu1, classmate, stu2), (stu2, classmate, stu1)
Composition(组合) Relation:
- e.g., (B, husband_of, A),(A, mother_of, C),(B, father_of, C)
Which Relation can be modeled by TransE? Why?
- TransE cannot model symmetric relations
- TransE can model composition relations,when $r_3=r_1+r_2$
- Can TransE model 1-to-N relations?
- e.g., (qiguilin, teacher_of, stu1), (qiguilin, teacher_of, stu2),
(qiguilin, teacher_of, stu3), (qiguilin, teacher_of, stu4)… - 不能,否则stui均相等
- e.g., (qiguilin, teacher_of, stu1), (qiguilin, teacher_of, stu2),
Issue of TransE
- TransE is too simple to handle complex relations
- 1-to-N, N-to-1, N-to-N relations 不可能发生
9. Variants(变种) of TransE: TransH
For each relation, define a hyperplane $W_r$ and a relation vector dr. Then project the head entity vector $h$ and the tail entity vector $t$ onto the hyperplane $W_r$. 将向量映射到超平面做翻译
For example:
in TransE, h and h’’ will overlap. While in TransH, entity h and entity h’’ will overlap only with the projection h⊥.
10. Variants of TransE: TransR
- Both TransE and TransH models assume that entities and relationships are vectors in the same semantic space.
- 假设每一个关系,有自己的向量空间
- 因为毛主席和奥巴马虽然在总统空间接近,但是诗人空间却是不接近
TransR proposes:
Build entity and relation embeddings in the separate entity space and relation spaces;
Then projecting entities from entity space to the corresponding relation space and building translations between projected entities.
- Mapping entity embeddings into different semantic spaces
- The score(energy) function is correspondingly defined as (same as TransE):
11. Summary
Statistical reasoning uses statistical models to fit the samples and predicts the expected probabilities of the inferred knowledge .
Knowledge graph embedding based reasoning actually performs entity prediction and relation prediction with vector calculations.
Translation-based models are now widely used KG embedding models for KG completion and other applications due to its good performance and succinctness.