Smurf
文章50
标签0
分类6
Parsing

Parsing

Parsing

Parsing

1.上下无关文法与上下有关文法

  • Starting unit:
  • transition rule:
  • 上下文文无关语法
    • 不考虑前后依赖关系
  • 上下文有关语法
    • 考虑前后依赖关系
  • 上下文无关文法以及上下文有关文法都具有歧义,即不同语料表达的意思不一定相同

2. Parsing

2.1 Def

  • Suntactic Structure: Constitunency = phrase structure grammar = context - free grammars
  • 传统认为,句子构成应该是:从最简单的单词出发,然后不同的词组成短语,再将短语组成大的短语。
    • (the, cuddly, cat, by, the, door)(the cuddly cat|by the door)(the cuddly cat by the door)
  • Gramma:

2.2 PP Attachment Ambiguities Multiply

  • Look in the large crate in the kitchen by the door
    • PP存在修饰歧义
  • 通常我们把一个句子的核心词称为head,修饰部分为modification

image-20211130194120956

  • Prepositional Phrase Attachment Ambiguity
    • San Jose cops kill man with knife

image-20211130194128156

  • 不同的是,中文由于语法习惯不一样不存在相同的歧义。

image-20211130194143892

image-20211130191738431

  • A key parsing decision is how we ‘attach’ various constituents
  • PPs , adverbial or participial phrases, infinitives, coordinations

2.3 Coordination Scope Ambiguity

  • 定语修饰范围出现错误

  • Shuttle veteran and longtime NASA executive Fred Gregory appointed to board

    • [Shuttle veteran] and [longtime NASA executive Fred Gregory ]appointed to board
    • [Shuttle veteran and longtime NASA executive] Fred Gregory appointed to board
  • Doctor:No heart, cognitive issues
    • Doctor:No [heart, cognitive issues]
    • Doctor:[No heart], [cognitive issues]

2.4 Adjectival/Adverbial Modifier Ambiguity

  • Students get first hand job experience
    • Students get [first hand] [job experience]
    • Students get [first [hand job] experience]

2.5 Verb Phrase (VP) attachment ambiguity

  • Multilated body washes up on Rio beach to be used for Olympics beach volleyball
    • Multilated body washes up on Rio beach to be used for Olympics beach volleyball
    • Multilated body washes up on Rio beach to be used for Olympics beach volleyball

3. Dependency Grammar and Dependency Structure

3.1 Def

image-20211130193236745

  • Dependency syntax postulates that syntactic structure consists of relations between lexical items, normally binary asymmetric relations (“arrows”) called dependencies
    • 研究紧挨着的两个词的关系,并且会给边打上相关的标签
  • The arrows arecommonly typed with the name of grammatical relations (subject,prepositional object,apposition, etc.)
    • An arrow connects a head(governor, superior,regent) with a dependent(modifier, inferior,subordinate)
    • Usually, dependencies form a tree (a connected,acyclic, single-root graph)
  • 处于支配地位的成分称为支配者head,governor,regent,而处于被支配地位的成分称为从属者 modifier,subordinate,dependency

  • 值得一提的是,我们都会在句子解析结构加上根节点以保证树状结构

3.2 Dependency Relations

image-20211130194304794

3.3 Dependency Grammar and Dependency Structure

ROOT

What are the sources of information for dependency parsing?

  • Bilexical affinities 同词性之间的依赖关系​

    • The dependency [discussion issues] is plausible
  • Dependency distance 依赖距离

    • Most dependencies are between nearby words
  • Intervening material

    • Dependencies rarely span intervening verbs or punctuation
    • 依赖关系很少跨越中间的动词或标点符号
  • Valency of heads

    • How many dependents on which side are usual for a head?

image-20211130194346325

  • A sentence is parsed by choosing for each word what other word (including ROOT) it is a dependent of
  • Usually some constraints:
    • Only one word is a dependent of ROOT (single headed)
    • Don’t want cycles (acyclic)
  • This makes the dependencies a tree
  • Final issue is whether arrows can cross (be non-projective) or not
    • 是否允许交叉将使得建树的方法和结果有所不同

image-20211130195723374

3.4 Methods of Dependency Parsing

image-20211130195810761

3.4.1 Greedy transition based parsing

  • A simple form of greedy discriminative dependency parser
  • The parser does a sequence of bottom up actions

    • Roughly like “shift” or “reduce” in a shift-reduce parser, but the “reduce” actions are specialized to create dependencies with head on left or right
  • The parser has:

    • a stack , written with top to the right
      • which starts with the ROOT symbol
      • 用一个栈存放代操作的字符,后进先出
    • a buffer , written with top to the left
      • which starts with the input sentence
      • 用一个队列存放剩余需要操作的句子,先进先出
    • a set of dependency
      • which starts off empty
      • 有一个集合存放所有弧的关系
    • a set of actions
      • 用一个集合存放所有操作
Basic Transition based Dependency Parser

image-20211130200334952

Arc standard Transition based Parser
  • (there are other transition schemes …) Analysis of “I ate fish“

image-20211130200621599

image-20211130200634630

Example
  • Left-Arc
    Precondition: ​ ROOT

    • 意思是当所在的stack中相邻的两个symbol的左依赖关系没有在集合里时,可以进行操作
    • 操作时,我们比较栈顶和队首的两个词,如果存在Left Arc,那么我们只pop掉栈顶的词
  • Right-Arc

    • 操作时,我们比较栈顶和队首的两个词,如果存在right Arc,我们就pop掉队首并且push进栈顶
    • 这里我理解是,Right arc的两个词都有可能产生新的依赖关系,最具标志性的就是ROOT连接句子的第一个修饰的词
  • Reduce
    Precondition:

    • 意思是当队列为空,且所在的stack中相邻的两个symbol的左依赖关系有在集合里时,那么直接pop栈顶
  • Shift

    • 当栈顶和队首不存在任何依赖关系时,且队列不为空时,直接将队首的词加入栈顶

image-20211130200729008

MaltParser

image-20211130204330357

  • 可以用SVM来判断应该做什么action

  • 可以使用beam search的方法来优化搜索

3.4.2 Conventional Feature Representation

image-20211130204516143

3.5 Evaluation of Dependency Parsing: (labeled)

dependency accuracy

image-20211130204603550

  • 没有标签:UAS
  • 有标签:LAS

3.6 Indicator Features Revisited

  • Problem #1:
    • sparse
  • Problem#2:
    • incomplete
  • Problem#3:
    • expensive computation
  • More than 95% of parsing time is consumed by feature computation

image-20211130204924771

3.7 Why do we gain from a neural dependency parser?

  • dense vector
  • complete
  • trainable, low computation

image-20211130205015360

4. Further developments in transition based neural dependency parsing

image-20211130205106404

Graph based Dependency Parsers

  • Compute a score for every possible dependency for each word

image-20211130205142883

  • 权重最大的组成依赖关系
本文作者:Smurf
本文链接:http://example.com/2021/08/15/nlp%20learning/Chapter10_Parsing/
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可