Smurf
文章50
标签0
分类6
sequence labelling and relation extraction

sequence labelling and relation extraction

  1. 1. 1. IE
    1. 1.1. 1.1 Simplily Introduction
    2. 1.2. 1.2 Low-level information extraction
    3. 1.3. 1.3 Named Entity Recognition (NER)
    4. 1.4. 1.4 The uses:
    5. 1.5. 1.5 Concretely:(具体地)
    6. 1.6. 1.6 Sequence Models for Named Entity Recognition
    7. 1.7. 1.7 Features
      1. 1.7.1. 1.7.1 Features for sequence labeling 序列标签的特征
      2. 1.7.2. 1.7.2 Features: Word substrings
      3. 1.7.3. 1.7.3 Word Shapes
    8. 1.8. 1.8 Maximum entropy Markov models (MEMMs) or Conditional Markov models
      1. 1.8.1. 1.8.1 Sequence problems
      2. 1.8.2. 1.8.2 MEMM Inference in Systems
      3. 1.8.3. 1.8.3 Scoring individual labeling decisions is no more complex than standard classification decisions
    9. 1.9. 1.9 Search
      1. 1.9.1. 1.9.1 Greedy Inference
        1. 1.9.1.1. Greedy inference:
        2. 1.9.1.2. Advantages:
        3. 1.9.1.3. Disadvantage:
      2. 1.9.2. 1.9.2 Beam Search
        1. 1.9.2.1. Beam inference:
        2. 1.9.2.2. Advantages:
        3. 1.9.2.3. Disadvantage:
      3. 1.9.3. 1.9.3 Viterbi Inference
        1. 1.9.3.1. iterbi inference:
        2. 1.9.3.2. Advantages:
        3. 1.9.3.3. Disadvantage:
    10. 1.10. 1.10 CRFs
  2. 2. 2. Extracting relations from text
    1. 2.1. 2.1 Extracting relation triples from text
    2. 2.2. 2.2 Why Relation Extraction?
    3. 2.3. 2.3 Automated Content Extraction (ACE)
    4. 2.4. 2.4 UMLS: Unified Medical Language System
      1. 2.4.1. Extracting UMLS relations from a sentence
    5. 2.5. 2.5 Databases of Wikipedia Relations
  3. 3. 3. How to build relation extractors
    1. 3.1. 3.1 Rules for extracting IS-A relation
      1. 3.1.1. 3.1.1 Extracting Richer Relations Using Rules
      2. 3.1.2. 3.1.2 Summary: Hand-built patterns for relations
    2. 3.2. 3.2 Supervised machine learning for relations
      1. 3.2.0.1. Step
      2. 3.2.0.2. Why the extra step?
  4. 3.3. 3.3 Gazeteer and trigger word features for relation extraction
  5. 3.4. 3.4 Classifiers for supervised methods
  6. 3.5. 3.5 Summary: Supervised Relation Extraction

sequence labelling and relation extraction

1. IE

1.1 Simplily Introduction

image-20211102101243241

image-20211102101259725

  • 从有限文本找到相关文本,并从文本收集信息,最后表示出来。

  • IE systems extract clear, factual information

    • Roughly: Who did what to whom when?
  • E.g..
    • Gathering earnings,profits, headquarters, etc. from company reports
      • The headquarters of Alibaba Group, and the global headquarters of the combined Alibaba Group,are located in Hangzhou.
      • headquarters(“Alibaba Group”,”Hangzhou”)
        • 表达成结构化形式
  • Learn drug-gene product interactions from medical research literature

1.2 Low-level information extraction

  • Is now available and I think popular in applications like textapp. mail app, etc.

  • Often seems to be based on regular expressions and name lists

image-20211102101928177

image-20211102102049638

1.3 Named Entity Recognition (NER)

  • very important sub-task: find and classify names in text,
  • For example:

image-20211102102441849

  • 人名、组织/机构名、地理位置、时间/日期字符值、金额值、领域实体

1.4 The uses:

  • Named entities can be indexed,linked off, etc.
  • Sentiment can be attributed to companies or products.
    • 归因产品使用的情绪
  • A lot of IE relations are associations between named entities.. For question answering, answers are often named entities.

1.5 Concretely:(具体地)

  • Many web pages tag various entities, with links to topic pages, etc.

image-20211102102939336

  • Googlel Apple…. smart recognizers for document content
    • 通过命名实体识别,给新闻阅读带来更好的体验,即给检索到的实体加入相应的url链接

image-20211102103243640

  • Recall and precision are straightforward for tasks like IR and text categorization,where there is only one grain size(晶粒尺寸) (documents)

  • The measure behaves a bit funnily for IE/NEP. when there are(boundary errors (which are common):

    • 紫金山森林公园位于南京市玄武区.
    • first Bank of China
      • 有可能只识别出Bank of China
      • This counts as both a fp and a fn 这将导致fp和fn上升
  • Selecting nothing would have been better?

  • Some other metrics (e.g., Muc scorer) give partial credit(according to complex rules)

1.6 Sequence Models for Named Entity Recognition

image-20211102105111633

  • Training
    • Collect a set of representative training documents
    • Label each token for its entity class or other (O)
    • Design feature extractors appropriate to the text and classes
    • Train a sequence classifier to predict the labels from the data
  • Testing/Classifying
    • Receive a set of testing documents
    • Run sequence model inference to label each token
    • Appropriately output the recognized entities

image-20211102105840193

  • 第一种encoding会出现边界问题,比如Mengqiu Huang这个人应该是一个整体,但识别时会被分成两部分
    • C个类别,那么label有C+1种,对于运算的空间相比于第二种小
    • 造成这种原因是一下子出现三个PER,无法判别是否能打包成一个实体,但是通常意义上讲,紧挨着的实体不是同一个类别,所以IO对于大样本更适合。
  • 第二种,当读到Begin是开始,读到I,则是紧接上一个

    • 有2C+1个label,效率比较低,但带来了更高的准确率
  • 更多模型:IOBE,IOBS,但是要考虑训练开销于准确率的权衡

1.7 Features

1.7.1 Features for sequence labeling 序列标签的特征

  • Words
    • Current word
    • Previous/next word (context)
  • Other kinds of inferred linguistic classification. 语义级别特征、语法级别特征
    • Part-of-speech tags
  • Label context
    • Previous (and perhaps next) label的特征)

1.7.2 Features: Word substrings

image-20211102111214456

  • 只要出现xazo就是药,出现field就是地点,出现冒号就是电影
    • 这种维度的特征对下游任务十分有效

1.7.3 Word Shapes

  • Map words to simplified representation that encodes attributes such as length, capitalization,numerals ,Greek letters,internal punctuation, etc.
  • 不同形状的词就蕴含了信息

image-20211102111335388

1.8 Maximum entropy Markov models (MEMMs) or Conditional Markov models

1.8.1 Sequence problems

  • Many problems in NLP have data which is a sequence of characters,words,phrases,lines, or sentences ….
  • we can think of our task as one of labeling each item

image-20211102111721340

  • 进来一个序列,对每一个文本块进行识别

1.8.2 MEMM Inference in Systems

  • For a Conditional Markov Model(CMM) a.k.a. a Maximum Entropy Markov Model (MEMM), the classifier makes a single decision at a time, conditioned on evidence from observations and previous decisions
  • A larger space of sequences is usually explored via search

1.8.3 Scoring individual labeling decisions is no more complex than standard classification decisions

  • We have some assumed labels to use for prior positions

  • We use features of those and the observed data (which can include
    current, previous, and next words) to predict the current label

image-20211102112030362

image-20211102112311682

  • 尽可能保证使用贪心的策略,每次都是使得当前的词最大

1.9.1 Greedy Inference

image-20211102112412895

Greedy inference:
  • We just start at the left, and use our classifier at each position to assign a label
  • The classifier can depend on previous labeling decisions as well as observed data
Advantages:
  • Fast, no extra memory requirements
  • Very easy to implement
  • With rich features including observations to the right, it may perform quite well
Disadvantage:
  • Greedy.
  • We make commit errors we cannot recover from

image-20211102112513565

Beam inference:
  • At each position keep the top k complete sequences.
  • Extend each sequence in each local way.
  • The extensions compete for the k slots at the next position.
Advantages:
  • Fast; beam sizes of 3-5 are almost as good as exact inference in many cases.
  • Easy to implement (no dynamic programming required).
Disadvantage:
  • Inexact: the globally best sequence can fall off the beam.

image-20211102112542285

image-20211102112724782

1.9.3 Viterbi Inference

image-20211102112759325

iterbi inference:
  • Dynamic programming or memoization.
  • Requires small window of state influence (e.g., past two states are relevant).
Advantages:
  • Exact: the global best sequence is returned.
Disadvantage:
  • Harder to implement long-distance state-state interactions (but beam
    inference tends not to allow long-distance resurrection of sequences any way).

1.10 CRFs

参考资料:CRF条件随机场的原理、例子、公式推导和应用 - 知乎 (zhihu.com)

  • Another sequence model: Conditional Random Fields (CRFs)
  • A whole-sequence conditional model rather than a chaining of local models.
  • The space of c’s is now the space of sequences
    • But if the features f, remain local,the conditional sequence likelihood can be calculated exactly using dynamic programming
  • Training is slower, but CRFs avoid causal-competition biases
  • These (or a variant using a max margin criterion) are seen as the state-of-the-art these days … but in practice usually work much the same as MEMMs.

2. Extracting relations from text

  • Company report: “International Business Machines Corporation (IBM or thecompany) was incorporated in the State of New Vork on June 16,1911,as the Computing-Tabulating-Recording Co.(C-T-R.)…”
  • Extracted Complex Relation:
    • Company-Founding
    • Company IBM
    • Location New york
    • Date June 16,1911
    • Original-Name Computing-Tabulating-Recording Co.
  • But we will focus on the simpler task of extracting relation triples
    • Founding-year(IBM,1911)
    • Founding-location(IBM,New York)
    • 从文本中抽取出关系

image-20211102114048686

2.1 Extracting relation triples from text

image-20211102114348392

2.2 Why Relation Extraction?

  • NER:find classify——关系最终也会变成一个分类问题

  • Create new structured knowledge graphs, useful for any app

  • Augment current knowledge graphs

    • Adding words to WordNet thesaurus,facts to FreeBase or DBPedia
  • Support question answering

    • The grand daughter of which actor starred in the movie”E,T.”?
      (acted-in ?x “E.T.”)(is-a ?y actor)(granddaughter-of ?x?y)
  • But which relations should we extract?

2.3 Automated Content Extraction (ACE)

image-20211102114644218

  • Physical-Located PER-GPE
    • He was in Tennessee
  • Part-Whole-Subsidiary ORG-ORG
    • XYZ, the parent company of ABC
  • Person-Social-Family PER-PER
    • John’ s wife Yoko
  • Org-AFF-Founder PER-ORG
    • steve Jobs , co-founder of Apple

2.4 UMLS: Unified Medical Language System

  • 134 entity types, 54 relations

image-20211102155016595

Extracting UMLS relations from a sentence

  • Doppler echocardiography can be used to diagnose left anterior descending artery stenosis in patients with type 2 diabetes
    • Echocardiography, Doppler DIAGNOSES Acquired stenosis

2.5 Databases of Wikipedia Relations

image-20211102155213896

3. How to build relation extractors

  • Hand written patterns
  • Supervised machine learning
  • Semi supervised and unsupervised
    • Bootstrapping (using seeds)
    • Distant supervision
    • Unsupervised learning from the web

3.1 Rules for extracting IS-A relation

  • Early intuition from Hearst (1992)
    • “Agar is a substance prepared from a mixture of red algae(红脂),such as Gelidium(凝胶), for laboratory orindustrial use”
  • What does Gelidium mean?
  • How do you know?

Hearst’s Patterns for extracting IS-A relations

  • 表示a是b的模板
    • “Y such as X((,X)*(, and | or)X)”
    • “such Y as X”
    • “x or other Y”
    • “X and other Y”
    • “Y including x”
    • “Y,especially X”

3.1.1 Extracting Richer Relations Using Rules

  • Intuition: relations often hold between specific entities.
    • located-in(ORGANIZATION,LOCATION)
    • founded (PERSON,ORGANIZATION)
    • cures (DRUG, DISEASE)
  • Start with Named Entity tags to help extract relation!
  • 在已经知道命名实体的类别情况下,会很容易知道他们之间的关系

image-20211102115429268

  • 但这种情况也不一定

image-20211102115515916

  • Who holds what office in what organization?
    • PERSON , POSITION of ORG
      • George Marshall , Secretary of State of the United States
    • PERSON named|appointed|chose| etc PERSON Prep? POSITION
      • Truman appointed Marshall Secretary of State
    • PERSON [be]? named|appointed| etc ..) Prep? ORG POSITION
      • George Marshall was named US Secretary of State

3.1.2 Summary: Hand-built patterns for relations

Plus:

  • Human patterns tend to be high-precision.
  • Can be tailored to specific domains

Minus

  • Human patterns are often low-recall
  • A lot of work to think of all possible patterns!. Don’t want to have to do this for every relation!. e’d like better accuracy

3.2 Supervised machine learning for relations

image-20211102115828569

  • Choose a set of relations we’d like to extract
  • Choose a set of relevant named entities
  • Find and label data
    • Choose a representative corpus
    • Label the named entities in the corpus
    • Hand-label the relations between these entities
      • NLP标注,最终导出csv等结构化数据
    • Break into training, development , and test
Step
  • Find all pairs of named entities (usually in same sentence)

  • Decide if entities are related

    • 先看有没有关系,如果没有关系直接过滤掉
  • If yes,classify the relation

Why the extra step?
  • 有没有关系——局部特征,从而使得把特征进行打包,在进行分类

  • Faster classification training by eliminating most pairs.

  • Can use disjist(不相容) feature-sets appropriate for each task.

image-20211102120626485

  • 对于每个句子进行实体识别后,抽取成相关的一条数据集如上

image-20211102120636887

image-20211102120704940

image-20211102120855924

  • 通过句法的依赖关系,得到实体之间的依存关系,如上

3.3 Gazeteer and trigger word features for relation extraction

  • Trigger list for family: kinship terms
    • parent , wife, husband, grandparent, etc. [from WordNet]
  • Gazeteer:

    • Lists of useful geo or geopolitical words
      • Country name list
      • Other sub-entities
  • American Airlines, a unit of AMR, immediately matched the move, spokesman Tim wagner said.

image-20211102121044439

3.4 Classifiers for supervised methods

  • Now you can use any classifier you like

    • MaxEnt
    • Naive Bayes.
    • SVM
  • Train it on the training set, tune on the dev set, test on the test set

3.5 Summary: Supervised Relation Extraction

Plus:

  • Can get high accuracies with enough hand-labeled training data,if test similar enough to training

Minus:

  • Labeling a large training set is expensive

  • Supervised models are brittle,don’t generalize well to different genres

本文作者:Smurf
本文链接:http://example.com/2021/08/15/nlp%20learning/Chapter6_IE/
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可