QA - Smurf

1.Why is QA back centre stage of AI?

The way of Human machine information interaction has changed
The rapid development of mobile and wearable devices requires effective and accurate information service in the form of natural language

1. QA系统常见的问题类型：

Factoid questions 基于事实型的问题
- Who wrote “the Universal Declaration of Human Rights”?
- How many calories are there in two slices of apple pie?
- What is the average age of the onset of autism?
Complex(narrative) questions: 复杂（描述性）问题
- In children with an acute febrie illness, what is the effcacy of acetaminophen in reducing
- What do scholars think about Jefferson’s position on
  dealing with pirates?

2. QA常见方法：

IR-based approaches
- TREC; IBM Watson; Google
Knowledge-based and Hybrid approaches
- IBM Watson; Apple Siri; Wolfram Alpha; True
  Knowledge Evi
Community-based question answering
- 知乎、Quora
- 众包

3. IR-based Factoid QA

Question Processing 问题处理
- Detect question type, answer type, focus, relations
- 检测问题类型、答案类型、核心词、关系
  - 姚明、身高——>依赖
- Formulate queries to send to a search engine
- 形成queries输入搜索引擎
Passage Retrieval 文本查找
- Retrieve ranked documents 检索排名文档
  - 查询姚明、升高
  - TF-IDF
- 分解成合适的段落、并且重新排序 Break into suitable passages and rerank
  - NER、词性标注
  - 搜索所有数字，对所有数字进行排序
答案处理 Answer Processing
- 提取候选答案 Extract candidate answers
- 排序候选项 Rank candidates
  - using evidence from the text and external sources

3.1 Question Procesing

Answer Type Detection
- Decide the named entity type (person, place) of the answer
- 大致就是判断答案的命名实体是什么
Query Formulation
- Choose query keywords for the IR system
- 利用IR系统选择问答中的关键词
Question Type classification
- Is this a definition question, a math question, a list question?
- 答案类型识别，即看这个答案整体内容属于什么类型
Focus Detection
- Find the question words that are replaced by the answer
- 找出被答案替换的疑问词
Relation Extraction
- Find relations between entities in the question
- 查找问题中实体之间的关系
Example
Question: Please return the two states you could be reentering if you’re crossing Florida’s northern border 如果你穿越佛罗里达州、北部边界，请返回你可能重新进入的两个州
- Answer Type: US state
- Query: two states, border, Florida, north
- Focus: the two states
- Relations: borders(Florida, ?x, north)

3.2 Answer Type Detection: Named Entities

Who founded Virgin Airlines?
- PERSON
What Canadian city has the largest population?
- CITY.

3.2.1 Answer Type Taxonomy

6 coarse classes
- ABBEVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION, NUMERIC
- 缩写、实体、描述、人员、位置、数字
50 finer classes
- LOCATION: city, country, mountain…
- HUMAN: group, individual, title, description
- ENTITY: animal, body, color, currency…

3.2.2 Methods

Hand written rules
Machine Learning
Hybrids

Hand written rules

Regular expression based rules can get some cases: 基于正则表达式的规则可以获得某些情况：
- Who { is|was|are|were } PERSON
- PERSON (YEAR YEAR)
Other rules use the question headword 其他规则使用疑问词
- (the headword of the first noun phrase after the wh-word) （wh单词后第一个名词短语的中心词）
- Which city in China has the largest number of foreign financial companies?
- What is the state flower of California?

Machine Learning

Define a taxonomy of question types
Annotate training data for each question type
Train classifiers for each question class using a rich set of features.
- features include those hand written rules!
Features for Answer Type Detection
- Question words and phrases 疑问词和、短语
- Part of speech tags 词性标记
- Parse features (headwords) 分析功能（标题词）
- Named Entities 命名实体
- Semantically related words 语义相关词

3.3 Query Formulation

Select all non stop words in quotations 选择报价单中的所有非停用词
Select all NNP words in recognized named entities 选择已识别命名实体中的所有NNP字
Select all complex nominals with their adjectival modifiers 选择所有带形容词修饰符的复数名词
Select all other complex nominals 选择所有其他复杂名词
Select all nouns with their adjectival modifiers 选择所有名词及其形容词修饰语
Select all other nouns 选择所有其他名词
Select all verbs 选择所有动词
Select all adverbs 选择所有副词
Select the QFW word (skipped in all previous steps) 选择QFW字（在前面的所有步骤中跳过）
Select all other words 选择所有其他单词

3.4 Choosing keywords from the query

3.5 Passage Retrieval

Step 1: IR engine retrieves documents using query terms IR引擎使用查询系统术语检索文档
Step 2: Segment the documents into shorter units 将文档分割为较短的单元
- something like paragraphs
Step 3: Passage ranking 文章排名
- Use answer type to help rerank passages 使用答案类型帮助重新阅读文章

3.5.1 Features for Passage Ranking

Number of Named Entities of the right type in passage 段落中正确类型的命名实体数
Number of query words in passage 段落中的查询字的数量
Number of question N grams also in passage 段落中N词的数量
Proximity of query keywords to each other in passage 查询关键字在段落中相互的相似度
Longest sequence of question words 最长的疑问词序列
Rank of the document containing passage 包含段落的文档的文章

3.6 Answer Extraction

Run an answer type named entity tagger on the passages 在段落中检测答案实体类型的词
- Each answer type requires a named entity tagger that detects it 每个答案类型都要有一个标记用于检测
- If answer type is CITY, tagger has to tag CITY
  - Can be full NER, simple regular expressions, or hybrid
Return the string with the right type:
- Who is the prime minister of India (PERSON) Manmohan Singh , Prime Minister of India, had told left leaders that the deal would not be renegotiated
- How tall is Mt. Everest? (LENGTH) The official height of Mount Everest is
  29035 feet

3.7 Ranking Candidate Answers

But what if there are multiple candidate answers!
Q: Who was Queen Victoria’s second son?
Answer Type: Person
Passage:
- The Marie biscuit is named after Marie Alexandrovna , the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert

Use machine learning:Features for ranking candidate answers

Answer type match : Candidate contains a phrase with the correct answer type.
Pattern match : Regular expression pattern matches the candidate.
Question keywords: # of question keywords in the
Keyword distance: Distance in words between the candidate and query keywords
Novelty factor: A word in the candidate is not in the
Apposition features: The candidate is an appositive to question terms
Punctuation location: The candidate is immediately followed by a comma, period,
quotation marks, semicolon, or exclamation mark.
Sequences of question terms: The length of the longest sequence of question terms that occurs in the candidate answer.

Candidate Answer scoring in IBM Watson

Each candidate answer gets scores from >50 components
- (from unstructured text, semi structured text, triple stores)
- logical form (parse) match between question and candidate
  - 问题和候选人之间的逻辑形式（解析）匹配
- passage source reliability
  - 文章源的可靠性
- geospatial location
  - California is southwest of Montana”
- temporal relationships
- taxonomic classification

3.8 Common Evaluation Metrics

Accuracy (does answer match gold labeled answer?)
Mean Reciprocal Rank 平均倒数排名
- For each query return a ranked list of M candidate answers. 对于每个查询，返回M个候选答案的排序列表。
- Query score is 1/Rank of the first correct answer 查询分数为查到第一个正确答案排名的倒数
  - If first answer is correct: 1
  - else if second answer is correct: ½
  - else if third answer is correct: ⅓, etc.
  - Score is 0 if none of the M answers are correct
- Take the mean over all N queries

$MRR=\frac{\sum_{i=1}^N\frac{1}{rank_i}}{N}$

Relevance 相关度 The level in which the answer addresses users information needs
Correctness 正确度 The level in which the answer is factually correct
Conciseness 精炼度答案不包含不相关信息
Completeness 完备度答案应该完整
Simplicity 简单度答案易于解释
Justification 合理度 Sufficient context should be provided to support the data consumer in the determination of the query correctness

Right The answer is correct and complete
Inexact The answer is incomplete or incorrect
Unsupported The answer does not have an appropriate evidence/justification
Wrong： The answer is not appropriate for the question

4. Knowledge-based QA

构建查询的语义表示 Build a semantic representation of the query
- Times, dates, locations, entities, numeric quantities
从该语义映射到查询结构化数据或资源 Map from this semantics to query structured data or resources
- Geospatial databases
- Ontologies (Wikipedia infoboxes , dbPedia , WordNet , Yago
- Restaurant review sources and reservation services
- Scientific databases

4.1 Two challenges

词法鸿沟 Lexical Gap Example
- 构建出来的词不一定和知识库的实体相同 The constructed words are not necessarily the same as the entities of the knowledge base
语义鸿沟
- 构建出来的图不一定和知识库匹配 The constructed graph does not necessarily match the knowledge base

4.1.1 Lexical Gap Example

Which Greek cities have more than 1 million inhabitants?

SELECT DISTINCT ?uri
WHERE {
	?uri rdf:type dbo:City
	?uri dbo:country res:Greece
	?uri dbo:populationTotal ?p
	FILTER (?p > 1000000)
}

There are expressions with a fixed, dataset independent meaning.
- 有些表达式具有固定的、与数据集无关的含义。 most, one
Who produced the most films?

SELECT DISTINCT ?uri
WHERE {
?x rdf:type dbo:Film
?x dbo:producer uri
}
ORDER BY DESC(COUNT(?x))
OFFSET 0 LIMIT 1

Challenges (Semantic gap):
The semantic gap [ between natural language and knowledge graphs

4.1.2 Semantic Gap Example

Different datasets usually follow different schemas, thus provide different ways of answering an information need

The meaning of expressions like the verbs to be to have and prepositions of with etc strongly depends on the linguistic context

4.2 Pattern/Template based KN QA

Motivation
- In order to understand a user question, we need to understand

4.3 Methods

An approach that combines both an analysis of the semantic structure and a mapping of words to URIs 一种结合语义结构分析和单词到URI映射的方法
- Template generation 模板生成
  - Parse question to produce a SPARQL template that directly mirrors the structure of the question, including filters and aggregation operations解析问题以生成直接反映问题结构的SPARQL模板，包括过滤和聚合操作
- Template instantiation 模板实例化
  - Instantiate SPARQL template by matching natural language expressions with ontology concepts using statistical entity identification and predicate detection 通过使用统计实体识别和谓词检测将自然语言表达式与本体概念匹配来实例化SPARQL模板
Question Who produced the most films?

Step 1: Template generation Linguistic processing
首先，获取自然语言问题的 POS tags 信息
其次，基于 POS tags, 语法规则表示问句
然后利用 domain dependent 词汇和 domain independent 词汇辅助分析问题
最后，将语义表示转化为一个 SPARQL 模板

domain independent : who, the most
domain dependent : produced/VBD, films/NNS

Step2: Template matching and instantiation NER
- 有了 SPARQL 模板以后需要进行实例化与具体的自然语言问句相匹配。即将自然语言问句与知识库中的本体概念相映射的过程
  - 对于 resources 和 classes, 实体识别常用方法
    - 用 WordNet 定义知识库中标签的同义词
    - 计算字符串相似度 ( Levenshtein 和子串相似度
  - 对于 property labels, 将还需要与存储在 BOA 模式库中的自然语言表示进行比较
  - 最高排位的实体将作为填充查询槽位的候选答案

Who produced the most films?
?c CLASS [films]
<http://dbpedia.org/ontology/Film>
<http://dbpedia.org/ontology/FilmFestival>
...
?p PROPERTY [produced]
<http://dbpedia.org/ontology/producer>
<http://dbpedia.org/property/producer>
<http://dbpedia.org/ontology/wineProduced>

Step 3:Ranking
- 每个 entity 根据 string similarity 和 prominence 获得一个打分
- 一个 query 模板的分值根据填充 slots 的多个 entities 的平均打分
- 另外需要检查 type 类型
  - 对于所有的三元组 ?x rdf type < 对于查询三元组 ?x p e 和 e p ?x 需要检查 p 的 domain/range 是否与 < 一致
- 对于全部的查询集合仅返回打分最高的

SELECT DISTINCT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/Film> .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.76
SELECT DISTINCT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/FilmFestival>.
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.60

4.4 Parsing based KB QA

大概和上面不同的是先将查询语句解析，即语义解析，最后构建查询；不同于上面模板用到了关系抽取、句法分析、语义组合 TODO
Phrase mapping
Query Struecture (Logical Form) Computing
Query Evaluation
Answers Ranking

Semantic Parsing on Freebase from Question Answer Pairs. EMNLP 2013

5. Hybrid approaches (IBM Watson)

Build a shallow semantic representation of the query
- 构建查询的浅层语义表示
Generate answer candidates using IR methods
- 使用IR方法生成候选答案
- Augmented with ontologies and semi structured data
- 增加了本体和半结构化数据
Score each candidate using richer knowledge sources
- 使用更丰富的知识来源为每个候选人打分
- Geospatial databases 地理空间数据库
- Temporal reasoning 时间推理
- Taxonomical classification 层次分类

6. End to End(deep learning) based KB QA

Only for Single Relation and Simple Question
Step1: Candidates generation
- Find main entity by Entity Linking 按实体链接查找主实体
- All entities around the main entity in KG are candidates KG中主实体周围的所有实体都是候选实体
Step2: Ranking

7. Dealing with unexpected things…

Caused by Processing
- Poor Ranking
- Harsh Query Constraints
- Misunderstanding of Query
Caused by Data
- Inaccurate Facts
- Incomplete Data

QA

1.Why is QA back centre stage of AI?

1. QA系统常见的问题类型：

2. QA常见方法：

3. IR-based Factoid QA

3.1 Question Procesing

3.2 Answer Type Detection: Named Entities

3.2.1 Answer Type Taxonomy

3.2.2 Methods

Hand written rules

Machine Learning

3.3 Query Formulation

3.4 Choosing keywords from the query

3.5 Passage Retrieval

3.5.1 Features for Passage Ranking

3.6 Answer Extraction

3.7 Ranking Candidate Answers

Use machine learning:Features for ranking candidate answers

Candidate Answer scoring in IBM Watson

3.8 Common Evaluation Metrics

4. Knowledge-based QA

4.1 Two challenges

4.1.1 Lexical Gap Example

4.1.2 Semantic Gap Example

4.2 Pattern/Template based KN QA

4.3 Methods

4.4 Parsing based KB QA

5. Hybrid approaches (IBM Watson)

6. End to End(deep learning) based KB QA

7. Dealing with unexpected things…

7.1 Search KG in Embedding Space