Authors:
(1) Silei Xu, Computer Science Department, Stanford University Stanford, CA with equal contribution {[email protected]};
(2) Shicheng Liu, Computer Science Department, Stanford University Stanford, CA with equal contribution {[email protected]};
(3) Theo Culhane, Computer Science Department, Stanford University Stanford, CA {[email protected]};
(4) Elizaveta Pertseva, Computer Science Department, Stanford University Stanford, CA, {[email protected]};
(5) Meng-Hsi Wu, Computer Science Department, Stanford University Stanford, CA, Ailly.ai {[email protected]};
(6) Sina J. Semnani, Computer Science Department, Stanford University Stanford, CA, {[email protected]};
(7) Monica S. Lam, Computer Science Department, Stanford University Stanford, CA, {[email protected]}.
WikiWebQuestions (WWQ) Dataset
Conclusions, Limitations, Ethical Considerations, Acknowledgements, and References
A. Examples of Recovering from Entity Linking Errors
Despite being the most popular large knowledge base for a long time, existing benchmarks on Wikidata with labeled SPARQL queries are unfortunately either small or of low quality. On the other hand, benchmarks over the deprecated Freebase still dominate the KBQA research with betterquality data. For example, the WebQuestions (Yih et al., 2015) dataset was collected by using Google Search API instead of human paraphrasing or synthesis. As a result, it is much more natural and truly reflects the real-world questions users may ask.
This dataset is later annotated with SPARQL over Freebase, named WebQuestionsSP (Yih et al., 2016). Examples with no legitimate SPARQL to retrieve answers from Freebase are dropped. In total, WebQuestionsSP consists of 3098 examples in the training set and 1639 in the test set.
We migrated WebQuestionsSP, the best collection of natural language questions over a general knowledge graph, from Freebase to Wikidata, with the help of an automatic tool we developed, based on Google’s entity mapping[2] and Wikidata’s relation mapping[3]. About 60% of the dataset was automatically converted. One of the authors of this paper, who did not participate in model tuning, manually converted those instances that failed to convert automatically.
Here are the major decisions we made in migrating WebQuestionsSP dataset to Wikidata. While much bigger, Wikidata does not necessarily contain all the information available in Freebase. For example, it lacks countries’ trade partners, hence we drop all such questions from the WebQuestionsSP dataset.
If multiple paths can lead to the correct answer, we choose the path that provides the most complete answers and has the best availability among entities in the same domain. For example, when asking for books written by an author X, we can either search for books whose author is X or find notable works of X that are books. While the latter is more efficient, the property notable works is not always available for all authors and it often does not provide a complete list. Thus, we annotate such examples using the former representation.
We also cleaned up the original dataset. The dataset contained questions like “who does Ronaldinho play for now in 2011?”. We drop the appended year as it conflicts with “now” in the utterance, and it would refer to the live information in Wikidata.
In total, we dropped 9% of the examples from WebQuestionsSP and created a training, dev, and test set of 2431, 454, and 1431 samples, respectively. Given that Wikidata has 100 million entities and 3,000 useful properties for answering questions, the training data set is woefully inadequate and can be considered as a “fewshot” training set at best.
This paper is available on arxiv under CC 4.0 license.
[2] https://developers.google.com/ freebase
[3] https://www.wikidata.org/wiki/ Wikidata:WikiProject_Freebase/Mapping