QA
1.Why is QA back centre stage of AI?
The way of Human machine information interaction has changed
The rapid development of mobile and wearable devices requires effective and accurate information service in the form of natural language
1. QA系统常见的问题类型:
- Factoid questions 基于事实型的问题
- Who wrote “the Universal Declaration of Human Rights”?
- How many calories are there in two slices of apple pie?
- What is the average age of the onset of autism?
- Complex(narrative) questions: 复杂(描述性)问题
- In children with an acute febrie illness, what is the effcacy of acetaminophen in reducing
- What do scholars think about Jefferson’s position on
dealing with pirates?
2. QA常见方法:
- IR-based approaches
- TREC; IBM Watson; Google
- Knowledge-based and Hybrid approaches
- IBM Watson; Apple Siri; Wolfram Alpha; True
Knowledge Evi
- IBM Watson; Apple Siri; Wolfram Alpha; True
- Community-based question answering
- 知乎、Quora
- 众包
3. IR-based Factoid QA
- Question Processing 问题处理
- Detect question type, answer type, focus, relations
- 检测问题类型、答案类型、核心词、关系
- 姚明、身高——>依赖
- Formulate queries to send to a search engine
- 形成queries输入搜索引擎
- Passage Retrieval 文本查找
- Retrieve ranked documents 检索排名文档
- 查询姚明、升高
- TF-IDF
- 分解成合适的段落、并且重新排序 Break into suitable passages and rerank
- NER、词性标注
- 搜索所有数字,对所有数字进行排序
- Retrieve ranked documents 检索排名文档
- 答案处理 Answer Processing
- 提取候选答案 Extract candidate answers
- 排序候选项 Rank candidates
- using evidence from the text and external sources
3.1 Question Procesing
- Answer Type Detection
- Decide the named entity type (person, place) of the answer
- 大致就是判断答案的命名实体是什么
- Query Formulation
- Choose query keywords for the IR system
- 利用IR系统选择问答中的关键词
Question Type classification
- Is this a definition question, a math question, a list question?
- 答案类型识别,即看这个答案整体内容属于什么类型
Focus Detection
- Find the question words that are replaced by the answer
- 找出被答案替换的疑问词
Relation Extraction
- Find relations between entities in the question
- 查找问题中实体之间的关系
Example
Question: Please return the two states you could be reentering if you’re crossing Florida’s northern border 如果你穿越佛罗里达州、北部边界,请返回你可能重新进入的两个州
- Answer Type: US state
- Query: two states, border, Florida, north
- Focus: the two states
- Relations: borders(Florida, ?x, north)
3.2 Answer Type Detection: Named Entities
Who founded Virgin Airlines?
- PERSON
What Canadian city has the largest population?
- CITY.
3.2.1 Answer Type Taxonomy
- 6 coarse classes
- ABBEVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION, NUMERIC
- 缩写、实体、描述、人员、位置、数字
- 50 finer classes
- LOCATION: city, country, mountain…
- HUMAN: group, individual, title, description
- ENTITY: animal, body, color, currency…
3.2.2 Methods
- Hand written rules
- Machine Learning
- Hybrids
Hand written rules
- Regular expression based rules can get some cases: 基于正则表达式的规则可以获得某些情况:
- Who { is|was|are|were } PERSON
- PERSON (YEAR YEAR)
- Other rules use the question headword 其他规则使用疑问词
- (the headword of the first noun phrase after the wh-word) (wh单词后第一个名词短语的中心词)
- Which city in China has the largest number of foreign financial companies?
- What is the state flower of California?
Machine Learning
- Define a taxonomy of question types
- Annotate training data for each question type
Train classifiers for each question class using a rich set of features.
- features include those hand written rules!
Features for Answer Type Detection
- Question words and phrases 疑问词和、短语
- Part of speech tags 词性标记
- Parse features (headwords) 分析功能(标题词)
- Named Entities 命名实体
- Semantically related words 语义相关词
3.3 Query Formulation
- Select all non stop words in quotations 选择报价单中的所有非停用词
- Select all NNP words in recognized named entities 选择已识别命名实体中的所有NNP字
- Select all complex nominals with their adjectival modifiers 选择所有带形容词修饰符的复数名词
- Select all other complex nominals 选择所有其他复杂名词
- Select all nouns with their adjectival modifiers 选择所有名词及其形容词修饰语
- Select all other nouns 选择所有其他名词
- Select all verbs 选择所有动词
- Select all adverbs 选择所有副词
- Select the QFW word (skipped in all previous steps) 选择QFW字(在前面的所有步骤中跳过)
- Select all other words 选择所有其他单词
3.4 Choosing keywords from the query
3.5 Passage Retrieval
Step 1: IR engine retrieves documents using query terms IR引擎使用查询系统术语检索文档
Step 2: Segment the documents into shorter units 将文档分割为较短的单元
- something like paragraphs
Step 3: Passage ranking 文章排名
- Use answer type to help rerank passages 使用答案类型帮助重新阅读文章
3.5.1 Features for Passage Ranking
- Number of Named Entities of the right type in passage 段落中正确类型的命名实体数
- Number of query words in passage 段落中的查询字的数量
- Number of question N grams also in passage 段落中N词的数量
- Proximity of query keywords to each other in passage 查询关键字在段落中相互的相似度
- Longest sequence of question words 最长的疑问词序列
- Rank of the document containing passage 包含段落的文档的文章
3.6 Answer Extraction
- Run an answer type named entity tagger on the passages 在段落中检测答案实体类型的词
- Each answer type requires a named entity tagger that detects it 每个答案类型都要有一个标记用于检测
- If answer type is CITY, tagger has to tag CITY
- Can be full NER, simple regular expressions, or hybrid
- Return the string with the right type:
- Who is the prime minister of India (PERSON) Manmohan Singh , Prime Minister of India, had told left leaders that the deal would not be renegotiated
- How tall is Mt. Everest? (LENGTH) The official height of Mount Everest is
29035 feet
3.7 Ranking Candidate Answers
But what if there are multiple candidate answers!
Q: Who was Queen Victoria’s second son?
- Answer Type: Person
- Passage:
- The Marie biscuit is named after Marie Alexandrovna , the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert
Use machine learning:Features for ranking candidate answers
- Answer type match : Candidate contains a phrase with the correct answer type.
- Pattern match : Regular expression pattern matches the candidate.
- Question keywords: # of question keywords in the
- Keyword distance: Distance in words between the candidate and query keywords
Novelty factor: A word in the candidate is not in the
Apposition features: The candidate is an appositive to question terms
- Punctuation location: The candidate is immediately followed by a comma, period,
- quotation marks, semicolon, or exclamation mark.
- Sequences of question terms: The length of the longest sequence of question terms that occurs in the candidate answer.
Candidate Answer scoring in IBM Watson
- Each candidate answer gets scores from >50 components
- (from unstructured text, semi structured text, triple stores)
- logical form (parse) match between question and candidate
- 问题和候选人之间的逻辑形式(解析)匹配
- passage source reliability
- 文章源的可靠性
- geospatial location
- California is southwest of Montana”
- temporal relationships
- taxonomic classification
3.8 Common Evaluation Metrics
- Accuracy (does answer match gold labeled answer?)
- Mean Reciprocal Rank 平均倒数排名
- For each query return a ranked list of M candidate answers. 对于每个查询,返回M个候选答案的排序列表。
- Query score is 1/Rank of the first correct answer 查询分数为查到第一个正确答案排名的倒数
- If first answer is correct: 1
- else if second answer is correct: ½
- else if third answer is correct: ⅓, etc.
- Score is 0 if none of the M answers are correct
- Take the mean over all N queries
- Relevance 相关度 The level in which the answer addresses users information needs
- Correctness 正确度 The level in which the answer is factually correct
- Conciseness 精炼度 答案不包含不相关信息
- Completeness 完备度 答案应该完整
- Simplicity 简单度 答案易于解释
- Justification 合理度 Sufficient context should be provided to support the data consumer in the determination of the query correctness
- Right The answer is correct and complete
- Inexact The answer is incomplete or incorrect
- Unsupported The answer does not have an appropriate evidence/justification
- Wrong: The answer is not appropriate for the question
4. Knowledge-based QA
构建查询的语义表示 Build a semantic representation of the query
- Times, dates, locations, entities, numeric quantities
从该语义映射到查询结构化数据或资源 Map from this semantics to query structured data or resources
- Geospatial databases
- Ontologies (Wikipedia infoboxes , dbPedia , WordNet , Yago
- Restaurant review sources and reservation services
- Scientific databases
4.1 Two challenges
- 词法鸿沟 Lexical Gap Example
- 构建出来的词不一定和知识库的实体相同 The constructed words are not necessarily the same as the entities of the knowledge base
- 语义鸿沟
- 构建出来的图不一定和知识库匹配 The constructed graph does not necessarily match the knowledge base
4.1.1 Lexical Gap Example
- Which Greek cities have more than 1 million inhabitants?
1 | SELECT DISTINCT ?uri |
- There are expressions with a fixed, dataset independent meaning.
- 有些表达式具有固定的、与数据集无关的含义。 most, one
- Who produced the most films?
1 | SELECT DISTINCT ?uri |
Challenges (Semantic gap):
The semantic gap [ between natural language and knowledge graphs
4.1.2 Semantic Gap Example
- Different datasets usually follow different schemas, thus provide different ways of answering an information need
- The meaning of expressions like the verbs to be to have and prepositions of with etc strongly depends on the linguistic context
4.2 Pattern/Template based KN QA
- Motivation
- In order to understand a user question, we need to understand
4.3 Methods
An approach that combines both an analysis of the semantic structure and a mapping of words to URIs 一种结合语义结构分析和单词到URI映射的方法
- Template generation 模板生成
- Parse question to produce a SPARQL template that directly mirrors the structure of the question, including filters and aggregation operations解析问题以生成直接反映问题结构的SPARQL模板,包括过滤和聚合操作
- Template instantiation 模板实例化
- Instantiate SPARQL template by matching natural language expressions with ontology concepts using statistical entity identification and predicate detection 通过使用统计实体识别和谓词检测将自然语言表达式与本体概念匹配来实例化SPARQL模板
- Template generation 模板生成
Question Who produced the most films?
- Step 1: Template generation Linguistic processing
- 首先,获取自然语言问题的 POS tags 信息
- 其次,基于 POS tags, 语法规则表示问句
- 然后利用 domain dependent 词汇和 domain independent 词汇辅助分析问题
- 最后,将语义表示转化为一个 SPARQL 模板
- domain independent : who, the most
- domain dependent : produced/VBD, films/NNS
- Step2: Template matching and instantiation NER
- 有了 SPARQL 模板以后 需要进行实例化与具体的自然语言问句相匹配 。 即将自然语言问句与知识库中的本体概念相映射的过程
- 对于 resources 和 classes, 实体识别常用方法
- 用 WordNet 定义知识库中标签的同义词
- 计算字符串相似度 ( Levenshtein 和子串相似度
- 对于 property labels, 将还需要与存储在 BOA 模式库中的自然语言表示进行比较
- 最高排位的实体将作为填充查询槽位的候选答案
- 对于 resources 和 classes, 实体识别常用方法
- 有了 SPARQL 模板以后 需要进行实例化与具体的自然语言问句相匹配 。 即将自然语言问句与知识库中的本体概念相映射的过程
1 | Who produced the most films? |
- Step 3:Ranking
- 每个 entity 根据 string similarity 和 prominence 获得一个打分
- 一个 query 模板的分值根据填充 slots 的多个 entities 的平均打分
- 另外 需要检查 type 类型
- 对于所有的三元组 ?x rdf type < 对于查询三元组 ?x p e 和 e p ?x 需要检查 p 的 domain/range 是否与 < 一致
- 对于全部的查询集合 仅返回打分最高的
1 | SELECT DISTINCT ?x WHERE { |
4.4 Parsing based KB QA
大概和上面不同的是先将查询语句解析,即语义解析,最后构建查询;不同于上面模板用到了关系抽取、句法分析、语义组合 TODO
Phrase mapping
- Query Struecture (Logical Form) Computing
- Query Evaluation
- Answers Ranking
- Semantic Parsing on Freebase from Question Answer Pairs. EMNLP 2013
5. Hybrid approaches (IBM Watson)
Build a shallow semantic representation of the query
- 构建查询的浅层语义表示
Generate answer candidates using IR methods
- 使用IR方法生成候选答案
- Augmented with ontologies and semi structured data
- 增加了本体和半结构化数据
Score each candidate using richer knowledge sources
- 使用更丰富的知识来源为每个候选人打分
- Geospatial databases 地理空间数据库
- Temporal reasoning 时间推理
- Taxonomical classification 层次分类
6. End to End(deep learning) based KB QA
Only for Single Relation and Simple Question
Step1: Candidates generation
- Find main entity by Entity Linking 按实体链接查找主实体
- All entities around the main entity in KG are candidates KG中主实体周围的所有实体都是候选实体
Step2: Ranking
7. Dealing with unexpected things…
- Caused by Processing
- Poor Ranking
- Harsh Query Constraints
- Misunderstanding of Query
- Caused by Data
- Inaccurate Facts
- Incomplete Data