Difference between revisions of "question Answering with Subgraph Embeddings"

From statwiki
Jump to: navigation, search
(Task Definition)
Line 18: Line 18:
  
 
WebQuestions [1] was used for evaluation benchmark. WebQuestions only contains a few samples, so it was not possible to train the system on only this dataset. The following describes the data sources used for training.
 
WebQuestions [1] was used for evaluation benchmark. WebQuestions only contains a few samples, so it was not possible to train the system on only this dataset. The following describes the data sources used for training.
*WebQuestions: Dataset built using Freebase as the KB and contains 5810 question-answer pairs. It was created by crawling questions through the Google Suggest API and then obtaining answers using Amazon Mechanical Turk (Turkers was allowed to only use Freebase as the querying tool).
+
*WebQuestions: the dataset built using Freebase as the KB and contains 5810 question-answer pairs. It was created by crawling questions through the Google Suggest API and then obtaining answers using Amazon Mechanical Turk (Turkers was allowed to only use Freebase as the querying tool).
*Freebase:  
+
*Freebase: is a huge database of general facts that are organized in triplets (\texttt{subjetct}
 
*ClubWeb Extractions:
 
*ClubWeb Extractions:
 
*Paraphrases:
 
*Paraphrases:
  
 
== Embedding Questions and Answers ==
 
== Embedding Questions and Answers ==

Revision as of 19:28, 9 November 2015

Introduction

Teaching machines are you answer questions automatically in a natural language has been a long standing goal in AI. There has been a rise in large scale structured knowledge bases (KBs), such as Freebase [3], to tackle the problem known as open-domain question answers (or open QA). However, the scale and difficulty for machines to interpret natural language still makes this problem challenging.

open QA techniques can be classified into two main categories:

  • Information retrieval based: retrieve a broad set of answers be first query the API of the KBs then narrow down the answer using heuristics [8,12,14].
  • Semantic parsing based: focus on the correct interpretation of the query. Querying the interpreted question from the KB should return the correct answer [1,9,2,7].

Both of these approaches require negligible interventions (hand-craft lexicons, grammars and KB schemas) to be effective.

[5] proposed a vectorial feature representation model to this problem. The goal of this paper is to provide an improved model of [5] specifically with the contributions of:

  • A more sophisticated inference procedure that is more efficient and can consider longer paths.
  • A richer representation of of the answers which encodes the question-answer path and surround subgraph of the KB.

Task Definition

Motivation is to provide a system for open QA able to be trained as long as:

  • A training set of questions paired with answers.
  • A KB providing a structure among answers.

WebQuestions [1] was used for evaluation benchmark. WebQuestions only contains a few samples, so it was not possible to train the system on only this dataset. The following describes the data sources used for training.

  • WebQuestions: the dataset built using Freebase as the KB and contains 5810 question-answer pairs. It was created by crawling questions through the Google Suggest API and then obtaining answers using Amazon Mechanical Turk (Turkers was allowed to only use Freebase as the querying tool).
  • Freebase: is a huge database of general facts that are organized in triplets (\texttt{subjetct}
  • ClubWeb Extractions:
  • Paraphrases:

Embedding Questions and Answers