Statistical Resoning


  • Statistical Reasoning tries to find suitable statistical models to fit the samples and predicts the expected probabilities of the inferred knowledge. 预测未来知识出现的概率

  • knowledge graph embedding based reasoning

  • inductive rule learning based reasoning
  • multi-hop reasoning


  • Predicting the missing link.
  • Given e1 and e2, predict the relation r.
  • Predicting the missing entity.
  • Given e1 (e2)and relation r, predict the missing entity e2 (e1).
  • Fact Prediction.
  • Given a triple, predict whether it is true or false.

2. Embedding: Meaning of a Word

  • What is the meaning of a word?
  • By ontologies? By Knowledge Graph?
  • But ontologies and KGs are hard to construct and often incomplete 无法穷举
  • How to encode the meaning of a word?

3. One-hot Representation

  • Vocabulary: (cat, mat, on, sat, the)
    • cat: 10000 mat: 01000 on: 00100 sat: 00010 the: 00001
  • “The cat sat on the mat”


  • Disadvantage: too sparsity


  • One-hot representation:
    • Foundation of Bag-of-words Model
  • 无法衡量语义相关度


4. Distributional Representation

  • When a word w appears in text, its context is the set of words that appear nearby (within a fixed-size window): 用中心词周围的词表示该词

  • Use many contexts of w to build up a representation of w


  • 建立一个稠密向量

5. Word Vectors

  • We will build a dense vector for each word, chosen so that it is similar to vectors of words that appear in similar contexts.


  • Note: word vectors are sometimes called word embeddings. They are a distributed representation.

6. Advantage of Distributed Representation

  1. Deal with data sparsity problem in NLP
  2. Realize knowledge transfer across domains and across objects
  3. Provide a unified representation for multi-task learning


6.1 Representation Learning

  • What is the representation learning?
    • Objects are represented as dense, real-value and low-dimensional vector


6.2 Different ways of KG Representation


  • Tensor: 自由度更高,隐式知识,但不容易扩展,不容易解释

6.3 Knowledge Graph Embedding: Application

  • Entity Prediction
    • 卧虎藏龙 Has-director ?
    • 卧虎藏龙 Has-director:Ang Lee


  • Relation Prediction


  • Recommendation System


7. TransE: Take Relation as Translation

  • For a fact (head, relation, tail), take the relation as a translation operator from the head to the tail .


  • 实体经过关系的翻译到另一个实体


  • For each triple , h is translated to t by r.


  • Train TransE Energy Function:
  • If the triple is true, the translated distance between (h + r) and t is shorter.

  • L1 (Manhattan) distance:

  • L2 (Euclidean) distance:


  • Triple1:
  • Triple2:
  • Triple3:
  • false triple examples:
How to distinguish?(true and false)


  • Minimize the distance between (h+l) and t.
  • Maximize the distance between (h’+l) to a randomly sampled tail t’ (negative example).
    • 最小化正类表示的差距,最大化负类表示的差距


  • Tbatch就是一个正例和负例元组的集合
  1. input Training set $S=\{(h, \ell, t)\}$, entities and relations. sets $E$ and $L$, margin $\lambda$, embeddings dim. $k$.
  2. Initialize entity and relationship embedding;
  3. Entity and relationship embedding normalization;
    For each entity e(Suppose there are M elements in the entity set E)

4、Negative Sampling


  • Evaluation protocol:

Metrics: 遍历所有实例,进行距离计算,并排序

  • Link Prediction
    ( WALL-E , _has_genre , ? )

  • Mean Ranks: the mean of those predicted ranks.

  • Hits@10: the proportion of correct entities ranked in the top 10.
    e.g. Entity 1: rank -> 50; Entity 2: rank -> 100; MR = (50+100)/2 = 75


We have two types of relations in KG, for example:

  • Symmetric Relation:

    • e.g., (stu1, classmate, stu2), (stu2, classmate, stu1)
  • Composition(组合) Relation:

    • e.g., (B, husband_of, A),(A, mother_of, C),(B, father_of, C)

Which Relation can be modeled by TransE? Why?

  • TransE cannot model symmetric relations


  • TransE can model composition relations,when $r_3=r_1+r_2$


  • Can TransE model 1-to-N relations?
    • e.g., (qiguilin, teacher_of, stu1), (qiguilin, teacher_of, stu2),
      (qiguilin, teacher_of, stu3), (qiguilin, teacher_of, stu4)…
    • 不能,否则stui均相等

Issue of TransE

  • TransE is too simple to handle complex relations
    • 1-to-N, N-to-1, N-to-N relations 不可能发生


9. Variants(变种) of TransE: TransH

For each relation, define a hyperplane $W_r$​ and a relation vector dr. Then project the head entity vector $h$ and the tail entity vector $t$ onto the hyperplane $W_r$. 将向量映射到超平面做翻译



For example:

in TransE, h and h’’ will overlap. While in TransH, entity h and entity h’’ will overlap only with the projection h⊥.


10. Variants of TransE: TransR

  • Both TransE and TransH models assume that entities and relationships are vectors in the same semantic space.


  • 假设每一个关系,有自己的向量空间
  • 因为毛主席和奥巴马虽然在总统空间接近,但是诗人空间却是不接近

TransR proposes:

  • Build entity and relation embeddings in the separate entity space and relation spaces;

  • Then projecting entities from entity space to the corresponding relation space and building translations between projected entities.


  • Mapping entity embeddings into different semantic spaces


  • The score(energy) function is correspondingly defined as (same as TransE):

11. Summary

  • Statistical reasoning uses statistical models to fit the samples and predicts the expected probabilities of the inferred knowledge .

  • Knowledge graph embedding based reasoning actually performs entity prediction and relation prediction with vector calculations.

  • Translation-based models are now widely used KG embedding models for KG completion and other applications due to its good performance and succinctness.

