Statistical Resoning
Statistical Resoning
1.Def
Statistical Reasoning tries to find suitable statistical models to fit the samples and predicts the expected probabilities of the inferred knowledge. 预测未来知识出现的概率
knowledge graph embedding based reasoning
- inductive rule learning based reasoning
- multi-hop reasoning
Tasks:
- Predicting the missing link.
- Given e1 and e2, predict the relation r.
- Predicting the missing entity.
- Given e1 (e2)and relation r, predict the missing entity e2 (e1).
- Fact Prediction.
- Given a triple, predict whether it is true or false.
2. Embedding: Meaning of a Word
- What is the meaning of a word?
- By ontologies? By Knowledge Graph?
- But ontologies and KGs are hard to construct and often incomplete 无法穷举
- How to encode the meaning of a word?
3. One-hot Representation
- Vocabulary: (cat, mat, on, sat, the)
- cat: 10000 mat: 01000 on: 00100 sat: 00010 the: 00001
- “The cat sat on the mat”
- Disadvantage: too sparsity
- One-hot representation:
- Foundation of Bag-of-words Model
- 无法衡量语义相关度
4. Distributional Representation
When a word w appears in text, its context is the set of words that appear nearby (within a fixed-size window): 用中心词周围的词表示该词
Use many contexts of w to build up a representation of w
- 建立一个稠密向量
5. Word Vectors
- We will build a dense vector for each word, chosen so that it is similar to vectors of words that appear in similar contexts.
- Note: word vectors are sometimes called word embeddings. They are a distributed representation.
Similarity:
6. Advantage of Distributed Representation
- Deal with data sparsity problem in NLP
- Realize knowledge transfer across domains and across objects
- Provide a unified representation for multi-task learning
6.1 Representation Learning
- What is the representation learning?
- Objects are represented as dense, real-value and low-dimensional vector
6.2 Different ways of KG Representation
- Tensor: 自由度更高,隐式知识,但不容易扩展,不容易解释
6.3 Knowledge Graph Embedding: Application
- Entity Prediction
- 卧虎藏龙 Has-director ?
- 卧虎藏龙 Has-director:Ang Lee
- Relation Prediction
- Recommendation System
7. TransE: Take Relation as Translation
- For a fact (head, relation, tail), take the relation as a translation operator from the head to the tail .
- 实体经过关系的翻译到另一个实体
TransE
- For each triple
, h is translated to t by r.
- Train TransE Energy Function:
If the triple is true, the translated distance between (h + r) and t is shorter.
L1 (Manhattan) distance:
- L2 (Euclidean) distance:
TransE
- Triple1:
- Triple2:
- Triple3:
- …
- false triple examples:
- …
How to distinguish?(true and false)
- Minimize the distance between (h+l) and t.
- Maximize the distance between (h’+l) to a randomly sampled tail t’ (negative example).
- 最小化正类表示的差距,最大化负类表示的差距
- Tbatch就是一个正例和负例元组的集合
- input Training set $S=\{(h, \ell, t)\}$, entities and relations. sets $E$ and $L$, margin $\lambda$, embeddings dim. $k$.
- Initialize entity and relationship embedding;
- Entity and relationship embedding normalization;
For each entity e(Suppose there are M elements in the entity set E)
4、Negative Sampling
- Evaluation protocol:
Metrics: 遍历所有实例,进行距离计算,并排序
Link Prediction
( WALL-E , _has_genre , ? )Mean Ranks: the mean of those predicted ranks.
- Hits@10: the proportion of correct entities ranked in the top 10.
e.g. Entity 1: rank -> 50; Entity 2: rank -> 100; MR = (50+100)/2 = 75
8.Question
We have two types of relations in KG, for example:
Symmetric Relation:
- e.g., (stu1, classmate, stu2), (stu2, classmate, stu1)
Composition(组合) Relation:
- e.g., (B, husband_of, A),(A, mother_of, C),(B, father_of, C)
Which Relation can be modeled by TransE? Why?
- TransE cannot model symmetric relations
- TransE can model composition relations,when $r_3=r_1+r_2$
- Can TransE model 1-to-N relations?
- e.g., (qiguilin, teacher_of, stu1), (qiguilin, teacher_of, stu2),
(qiguilin, teacher_of, stu3), (qiguilin, teacher_of, stu4)… - 不能,否则stui均相等
- e.g., (qiguilin, teacher_of, stu1), (qiguilin, teacher_of, stu2),
Issue of TransE
- TransE is too simple to handle complex relations
- 1-to-N, N-to-1, N-to-N relations 不可能发生
9. Variants(变种) of TransE: TransH
For each relation, define a hyperplane $W_r$ and a relation vector dr. Then project the head entity vector $h$ and the tail entity vector $t$ onto the hyperplane $W_r$. 将向量映射到超平面做翻译
For example:
in TransE, h and h’’ will overlap. While in TransH, entity h and entity h’’ will overlap only with the projection h⊥.
10. Variants of TransE: TransR
- Both TransE and TransH models assume that entities and relationships are vectors in the same semantic space.
- 假设每一个关系,有自己的向量空间
- 因为毛主席和奥巴马虽然在总统空间接近,但是诗人空间却是不接近
TransR proposes:
Build entity and relation embeddings in the separate entity space and relation spaces;
Then projecting entities from entity space to the corresponding relation space and building translations between projected entities.
TransR:
- Mapping entity embeddings into different semantic spaces
- The score(energy) function is correspondingly defined as (same as TransE):
11. Summary
Statistical reasoning uses statistical models to fit the samples and predicts the expected probabilities of the inferred knowledge .
Knowledge graph embedding based reasoning actually performs entity prediction and relation prediction with vector calculations.
Translation-based models are now widely used KG embedding models for KG completion and other applications due to its good performance and succinctness.