目录
- introduction
- IR-based QA (dominant approach)
- Knowledge-based QA
- Hybrid QA
- Conclusion
introduction
- Definition: question answering (“QA”) is the task of automatically determining the answer for a natural language question
- Mostly focus on “factoid” questions
- factoid question(not ambiguious)
- Factoid questions, have short precise answers:
- What war involved the battle of Chapultepec?
- What is the date of Boxing Day?
- What are some fragrant white climbing roses?
- What are tannins?
- Factoid questions, have short precise answers:
- non-factoid question
- General non-factoid questions require a longer answer, critical analysis, summary, calculation and more:
- Why is the date of Australia Day contentious?
- What is the angle 60 degrees in radians?
- General non-factoid questions require a longer answer, critical analysis, summary, calculation and more:
- why focus on factoid questions
- They are easier
- They have an objective answer
- Current NLP technologies cannot handle non-factoid answers(under developing)
- 2 key approaches
- Information retrieval-based QA
- Given a query, search relevant documents
- extract answers within these relevant documents
- Knowledge-based QA
- Builds semantic representation of the query
- Query database of facts to find answers
- Information retrieval-based QA
IR-based QA (dominant approach)
-
IR-based factoid QA: TREC-QA
- Use question to make query for IR engine
- query formulation: extract key terms to search for documents in the database
- answer type detection: guess what type of answer is the question looking for
- Find document, and passage(段,章) within document(find most relevant passages)
- Extract short answer string(use relevant passages and answer type information)
- Use question to make query for IR engine
-
question processing
- Find key parts of question that will help retrieval
- Discard non-content words/symbols (wh-word, ?, etc)
- Formulate as tf-idf query, using unigrams or bigrams
- Identify entities and prioritise match
- May reformulate question using templates
- E.g. “Where is Federation Square located?”
- Query = “Federation Square located”
- Query = “Federation Square is located [in/at]”
- Predict expected answer type (here = LOCATION)
- Find key parts of question that will help retrieval
-
answer types
- Knowing the type of answer can help in:
- finding the right passage containing the answer
- finding the answer string
- Treat as classification (a closed set of answer types)
- given question, predict answer type
- key feature is question headword
- What are the animals on the Australian coat of arms?
- Generally not a difficult task
- Knowing the type of answer can help in:
-
retrieval
- Find top n documents matching query (standard IR)
- Next find passages (paragraphs or sentences) in these documents (also driven by IR)
- a good passage should contain:
- many instances of the question keywords
- several named entities of the answer type
- close proximity(靠近) of these terms in the passage
- high ranking by IR engine
- Re-rank IR outputs to find best passage (e.g., using supervised learning)
-
answer extraction
-
Find a concise answer to the question, as a span in the passage
- “Who is the federal MP for Melbourne?”
- The Division of Melbourne is an Australian Electoral Division in Victoria, represented since the 2010 election by Adam Bandt, a member of the Greens.
- “How many Australian PMs have there been since 2013?”
- Australia has had five prime ministers in five years. No wonder Merkel needed a cheat sheet at the G-20.
-
how?
- Use a neural network to extract answer
- AKA reading comprehension task(assuming query and evidence passage are given, and find the span)
- But deep learning models require lots of data
- Do we have enough data to train comprehension models?
-
dataset
-
MCTest(a dataset)
- Crowdworkers write fictional stories, questions and answers
- 500 stories, 2000 questions
- Multiple choice questions
-
SQuAD
- Use Wikipedia passages(easier to create than MCTest)
- First set of crowdworkers create questions (given passage)
- Second set of crowdworkers label the answer
- 150K questions (!)
- Second version includes unanswerable questions(no answer in the passage)
-
-
reading comprehension
-
Given a question and context passage, predict where the answer span starts and end in passage?
-
Compute:
- P s t a r t ( i ) P_{start}(i) Pstart(i): prob. of token i is the starting token
- P e n d ( i ) P_{end}(i) Pend(i): prob. of token i is the ending token
-
LSTM-based model
-
Feed question tokens to a bidirectional LSTM
-
Aggregate LSTM outputs via weighted sum to produce q, the final question embedding
-
Process passage in a similar way, using another bidirectional LSTM
-
More than just word embeddings as input
- A feature to denote whether the word matches a question word
- POS feature
- Weighted question embedding: produced by attending to each question words
-
{ p 1 , . . . , p m p_1,...,p_m p1,...,pm}: one vector for each passage token from bidirectional LSTM
-
To compute start and end probability for each token
- p s t a r t ( i ) ∝ e x p ( p i W s q ) p_{start}(i) \propto exp(p_iW_sq) pstart(i)∝exp(piWsq)
- P e n d ( i ) ∝ e x p ( p i W e q ) P_{end}(i) \propto exp(p_iW_eq) Pend(i)∝exp(piWeq)
-
-
BERT-based model
-
Fine-tune BERT to predict answer span
- p s t a r t ( i ) ∝ e x p ( S T T i ′ ) p_{start}(i) \propto exp(S^TT_i') pstart(i)∝exp(STTi′)
- p e n d ( i ) ∝ e x p ( E T T i ′ ) p_{end}(i) \propto exp(E^TT_i') pend(i)∝exp(ETTi′)
-
why BERT works better than LSTM
- It’s pre-trained and so already “knows” language before it’s adapted to the task
- Self-attention architecture allows fine-grained analysis between words in question and context paragraph
-
-
-
Knowledge-based QA
- QA over structured KB
- Many large knowledge bases
- Freebase, DBpedia, Yago, …
- Can we support natural language queries?
- E.g.
- ‘When was Ada Lovalace born?’ → \to → birth-year (Ada Lovelace, ?x)
- ‘What is the capital of England’ → \to → capital-city(?x, England)
- Link “Ada Lovelace” with the correct entity in the KB to find triple (Ada Lovelace, birth-year, 1815)
- E.g.
- but
- Converting natural language sentence into triple is not trivial
- ‘When was Ada Lovelace born’ → \to → birth-year (Ada Lovelace, ?x)
- Entity linking also an important component
- Ambiguity: “When was Lovelace born?”
- Can we simplify this two-step process?
- Converting natural language sentence into triple is not trivial
- semantic parsing
-
Convert questions into logical forms to query KB directly
- Predicate calculus
- Programming query (e.g. SQL)
-
how to build a semantic parser
- Text-to-text problem:
- Input = natural language sentence
- Output = string in logical form
- Encoder-decoder model(but do we have enough data?)
- Text-to-text problem:
-
- Many large knowledge bases
Hybrid QA
-
hybrid methods
- Why not use both text-based and knowledgebased resources for QA?
- IBM’s Watson which won the game show Jeopardy! uses a wide variety of resources to answer questions
- (question)THEATRE: A new play based on this Sir Arthur Conan Doyle canine classic opened on the London stage in 2007.
- (answer)The Hound Of The Baskervilles
-
core idea of Watson
- Generate lots of candidate answers from textbased and knowledge-based sources
- Use a rich variety of evidence to score them
- Many components in the system, most trained separately
-
QA evaluation
- IR: Mean Reciprocal Rank for systems returning matching passages or answer strings
- E.g. system returns 4 passages for a query, first correct passage is the 3rd passage
- MRR = 1/3
- MCTest: Accuracy
- SQuAD: Exact match of string against gold answer
- IR: Mean Reciprocal Rank for systems returning matching passages or answer strings
Conclusion
- IR-based QA: search textual resources to answer questions
- Reading comprehension: assumes question+passage
- Knowledge-based QA: search structured resources to answer questions
- Hot area: many new approaches & evaluation datasets being created all the time (narratives, QA, commonsense reasoning, etc)