Lecture 19 Question Answering

introduction

Definition: question answering (“QA”) is the task of automatically determining the answer for a natural language question
Mostly focus on “factoid” questions
factoid question(not ambiguious)
- Factoid questions, have short precise answers:
  - What war involved the battle of Chapultepec?
  - What is the date of Boxing Day?
  - What are some fragrant white climbing roses?
  - What are tannins?
non-factoid question
- General non-factoid questions require a longer answer, critical analysis, summary, calculation and more:
  - Why is the date of Australia Day contentious?
  - What is the angle 60 degrees in radians?
why focus on factoid questions
- They are easier
- They have an objective answer
- Current NLP technologies cannot handle non-factoid answers(under developing)
2 key approaches
- Information retrieval-based QA
  - Given a query, search relevant documents
  - extract answers within these relevant documents
- Knowledge-based QA
  - Builds semantic representation of the query
  - Query database of facts to find answers

IR-based QA (dominant approach)

IR-based factoid QA: TREC-QA
1. Use question to make query for IR engine
  - query formulation: extract key terms to search for documents in the database
  - answer type detection: guess what type of answer is the question looking for
2. Find document, and passage(段，章) within document(find most relevant passages)
3. Extract short answer string(use relevant passages and answer type information)
question processing
- Find key parts of question that will help retrieval
  - Discard non-content words/symbols (wh-word, ?, etc)
  - Formulate as tf-idf query, using unigrams or bigrams
  - Identify entities and prioritise match
- May reformulate question using templates
  - E.g. “Where is Federation Square located?”
  - Query = “Federation Square located”
  - Query = “Federation Square is located [in/at]”
- Predict expected answer type (here = LOCATION)
answer types
- Knowing the type of answer can help in:
  - finding the right passage containing the answer
  - finding the answer string
- Treat as classification (a closed set of answer types)
  - given question, predict answer type
  - key feature is question headword
  - What are the animals on the Australian coat of arms?
  - Generally not a difficult task
retrieval
- Find top n documents matching query (standard IR)
- Next find passages (paragraphs or sentences) in these documents (also driven by IR)
- a good passage should contain:
  - many instances of the question keywords
  - several named entities of the answer type
  - close proximity(靠近) of these terms in the passage
  - high ranking by IR engine
- Re-rank IR outputs to find best passage (e.g., using supervised learning)
answer extraction
- Find a concise answer to the question, as a span in the passage
  - “Who is the federal MP for Melbourne?”
  - The Division of Melbourne is an Australian Electoral Division in Victoria, represented since the 2010 election by Adam Bandt, a member of the Greens.
  - “How many Australian PMs have there been since 2013?”
  - Australia has had five prime ministers in five years. No wonder Merkel needed a cheat sheet at the G-20.
- how?
  - Use a neural network to extract answer
  - AKA reading comprehension task(assuming query and evidence passage are given, and find the span)
  - But deep learning models require lots of data
  - Do we have enough data to train comprehension models?
- dataset
  - MCTest(a dataset)
    - Crowdworkers write fictional stories, questions and answers
    - 500 stories, 2000 questions
    - Multiple choice questions
  - SQuAD
    - Use Wikipedia passages(easier to create than MCTest)
    - First set of crowdworkers create questions (given passage)
    - Second set of crowdworkers label the answer
    - 150K questions (!)
    - Second version includes unanswerable questions(no answer in the passage)
- reading comprehension
  - Given a question and context passage, predict where the answer span starts and end in passage?
  - Compute:
    - $P_{start}(i)$ : prob. of token i is the starting token
    - $P_{end}(i)$ : prob. of token i is the ending token
  - LSTM-based model
    - Feed question tokens to a bidirectional LSTM
    - Aggregate LSTM outputs via weighted sum to produce q, the final question embedding
    - Process passage in a similar way, using another bidirectional LSTM
    - More than just word embeddings as input
      - A feature to denote whether the word matches a question word
      - POS feature
      - Weighted question embedding: produced by attending to each question words
    - { $p_1,...,p_m$ }: one vector for each passage token from bidirectional LSTM
    - To compute start and end probability for each token
      - $p_{start}(i) \propto exp(p_iW_sq)$
      - $P_{end}(i) \propto exp(p_iW_eq)$
  - BERT-based model
    - Fine-tune BERT to predict answer span
      - $p_{start}(i) \propto exp(S^TT_i')$
      - $p_{end}(i) \propto exp(E^TT_i')$
    - why BERT works better than LSTM
      - It’s pre-trained and so already “knows” language before it’s adapted to the task
      - Self-attention architecture allows fine-grained analysis between words in question and context paragraph

Knowledge-based QA

QA over structured KB
- Many large knowledge bases
  - Freebase, DBpedia, Yago, …
- Can we support natural language queries?
  - E.g.
    - ‘When was Ada Lovalace born?’ $\to$ birth-year (Ada Lovelace, ?x)
    - ‘What is the capital of England’ $\to$ capital-city(?x, England)
  - Link “Ada Lovelace” with the correct entity in the KB to find triple (Ada Lovelace, birth-year, 1815)
- but
  - Converting natural language sentence into triple is not trivial
    - ‘When was Ada Lovelace born’ $\to$ birth-year (Ada Lovelace, ?x)
  - Entity linking also an important component
    - Ambiguity: “When was Lovelace born?”
  - Can we simplify this two-step process?
- semantic parsing
  - Convert questions into logical forms to query KB directly
    - Predicate calculus
    - Programming query (e.g. SQL)
  - how to build a semantic parser
    - Text-to-text problem:
      - Input = natural language sentence
      - Output = string in logical form
    - Encoder-decoder model(but do we have enough data?)

Hybrid QA

hybrid methods
- Why not use both text-based and knowledgebased resources for QA?
- IBM’s Watson which won the game show Jeopardy! uses a wide variety of resources to answer questions
  - (question)THEATRE: A new play based on this Sir Arthur Conan Doyle canine classic opened on the London stage in 2007.
  - (answer)The Hound Of The Baskervilles
core idea of Watson
- Generate lots of candidate answers from textbased and knowledge-based sources
- Use a rich variety of evidence to score them
- Many components in the system, most trained separately
QA evaluation
- IR: Mean Reciprocal Rank for systems returning matching passages or answer strings
  - E.g. system returns 4 passages for a query, first correct passage is the 3rd passage
  - MRR = 1/3
- MCTest: Accuracy
- SQuAD: Exact match of string against gold answer

Conclusion

IR-based QA: search textual resources to answer questions
- Reading comprehension: assumes question+passage
Knowledge-based QA: search structured resources to answer questions
Hot area: many new approaches & evaluation datasets being created all the time (narratives, QA, commonsense reasoning, etc)