Authors:
(1) Silei Xu, Computer Science Department, Stanford University Stanford, CA with equal contribution {silei@cs.stanford.edu};
(2) Shicheng Liu, Computer Science Department, Stanford University Stanford, CA with equal contribution {shicheng@cs.stanford.edu};
(3) Theo Culhane, Computer Science Department, Stanford University Stanford, CA {tculhane@cs.stanford.edu};
(4) Elizaveta Pertseva, Computer Science Department, Stanford University Stanford, CA, {pertseva@cs.stanford.edu};
(5) Meng-Hsi Wu, Computer Science Department, Stanford University Stanford, CA, Ailly.ai {jwu@ailly.ai};
(6) Sina J. Semnani, Computer Science Department, Stanford University Stanford, CA, {sinaj@cs.stanford.edu};
(7) Monica S. Lam, Computer Science Department, Stanford University Stanford, CA, {lam@cs.stanford.edu}. Table of Links Abstract and Introduction Related Work Semantic Parsing for Wikidata WikiWebQuestions (WWQ) Dataset Implementation Experiments Experiment with QALD-7 Conclusions, Limitations, Ethical Considerations, Acknowledgements, and References A. Examples of Recovering from Entity Linking Errors 5 Implementation This section discusses the implementation details of the entity linker and the WikiSP semantic parser. 5.1 Entity Linking We use ReFinED (Ayoola et al., 2022) for entity linking, which is the current state of the art for WebQuestionsSP. As discussed before, Wikidata treats many common terms such as “country” as named entities and assigns them QIDs. To fine-tune ReFinED to learn such terms, we add the question and entity pairs from the training set of WikiWebQuestions to the data used to train ReFinED’s questions model. We run 10 epochs of finetuning using the default hyperparameters suggested by Ayoola et al. (2022). For each identified entity, we provide the mention in the original utterance, the QID, as well as its domain in plain text. The information is appended to the utterance before being fed into the neural semantic parsing model. 5.2 The WikiSP Semantic Parser We prepare the training data with entities provided by fine-tuned ReFinED. Comparing with the gold entities, ReFinED provides extra entities in 215 cases, while missing at least one entity in 137 cases. When ReFinED failed to produce the correct entities, we replace the missing QIDs in the logical form with the corresponding mention of the entity in the question. During evaluation, if a mention of an entity is predicted by the model, we look up the QID using the Wikidata “wbsearchentities” API [4]. We fine-tune LLaMA with 7B parameters because it has been shown to perform well despite its relatively small size (Touvron et al., 2023). We include the Alpaca (Taori et al., 2023) instruction following data, which was derived using the selfinstruct (Wang et al., 2023) method, in our training data. The training data samples in WikiWebQuestion start with the following instruction: “Given a Wikidata query with resolved entities, generate the corresponding SPARQL. Use property names instead of PIDs.”. We concatenate the resolved entities and the user utterance together as input. We up-sample the WikiWebQuestion fewshot set 5 times and train for 3 epochs using 2e-5 learning rate and 0.03 warmup ratio. 5.3 Executing Queries on Wikidata SPARQL queries are used to retrieve answers from the Wikidata SPARQL endpoint[5]. Since Wikidata is actively being updated, the gold SPARQL can be easily re-executed to acquire up-to-date answers, allowing the benchmark to compare with forthcoming iterations of large language models. This paper is available on arxiv under CC 4.0 license. [4] https://www.wikidata.org/w/api.php? action=wbsearchentities [5] https://www.wikidata.org/wiki/ Wikidata:SPARQL_query_service Authors: (1) Silei Xu, Computer Science Department, Stanford University Stanford, CA with equal contribution {silei@cs.stanford.edu}; (2) Shicheng Liu, Computer Science Department, Stanford University Stanford, CA with equal contribution {shicheng@cs.stanford.edu}; (3) Theo Culhane, Computer Science Department, Stanford University Stanford, CA {tculhane@cs.stanford.edu}; (4) Elizaveta Pertseva, Computer Science Department, Stanford University Stanford, CA, {pertseva@cs.stanford.edu}; (5) Meng-Hsi Wu, Computer Science Department, Stanford University Stanford, CA, Ailly.ai {jwu@ailly.ai}; (6) Sina J. Semnani, Computer Science Department, Stanford University Stanford, CA, {sinaj@cs.stanford.edu}; (7) Monica S. Lam, Computer Science Department, Stanford University Stanford, CA, {lam@cs.stanford.edu}. Authors: Authors: (1) Silei Xu, Computer Science Department, Stanford University Stanford, CA with equal contribution {silei@cs.stanford.edu}; (2) Shicheng Liu, Computer Science Department, Stanford University Stanford, CA with equal contribution {shicheng@cs.stanford.edu}; (3) Theo Culhane, Computer Science Department, Stanford University Stanford, CA {tculhane@cs.stanford.edu}; (4) Elizaveta Pertseva, Computer Science Department, Stanford University Stanford, CA, {pertseva@cs.stanford.edu}; (5) Meng-Hsi Wu, Computer Science Department, Stanford University Stanford, CA, Ailly.ai {jwu@ailly.ai}; (6) Sina J. Semnani, Computer Science Department, Stanford University Stanford, CA, {sinaj@cs.stanford.edu}; (7) Monica S. Lam, Computer Science Department, Stanford University Stanford, CA, {lam@cs.stanford.edu}. Table of Links Abstract and Introduction Abstract and Introduction Related Work Related Work Semantic Parsing for Wikidata Semantic Parsing for Wikidata WikiWebQuestions (WWQ) Dataset WikiWebQuestions (WWQ) Dataset Implementation Implementation Experiments Experiments Experiment with QALD-7 Experiment with QALD-7 Conclusions, Limitations, Ethical Considerations, Acknowledgements, and References Conclusions, Limitations, Ethical Considerations, Acknowledgements, and References A. Examples of Recovering from Entity Linking Errors A. Examples of Recovering from Entity Linking Errors 5 Implementation This section discusses the implementation details of the entity linker and the WikiSP semantic parser. 5.1 Entity Linking We use ReFinED (Ayoola et al., 2022) for entity linking, which is the current state of the art for WebQuestionsSP. As discussed before, Wikidata treats many common terms such as “country” as named entities and assigns them QIDs. To fine-tune ReFinED to learn such terms, we add the question and entity pairs from the training set of WikiWebQuestions to the data used to train ReFinED’s questions model. We run 10 epochs of finetuning using the default hyperparameters suggested by Ayoola et al. (2022). For each identified entity, we provide the mention in the original utterance, the QID, as well as its domain in plain text. The information is appended to the utterance before being fed into the neural semantic parsing model. 5.2 The WikiSP Semantic Parser We prepare the training data with entities provided by fine-tuned ReFinED. Comparing with the gold entities, ReFinED provides extra entities in 215 cases, while missing at least one entity in 137 cases. When ReFinED failed to produce the correct entities, we replace the missing QIDs in the logical form with the corresponding mention of the entity in the question. During evaluation, if a mention of an entity is predicted by the model, we look up the QID using the Wikidata “wbsearchentities” API [4]. We fine-tune LLaMA with 7B parameters because it has been shown to perform well despite its relatively small size (Touvron et al., 2023). We include the Alpaca (Taori et al., 2023) instruction following data, which was derived using the selfinstruct (Wang et al., 2023) method, in our training data. The training data samples in WikiWebQuestion start with the following instruction: “Given a Wikidata query with resolved entities, generate the corresponding SPARQL. Use property names instead of PIDs.”. We concatenate the resolved entities and the user utterance together as input. We up-sample the WikiWebQuestion fewshot set 5 times and train for 3 epochs using 2e-5 learning rate and 0.03 warmup ratio. 5.3 Executing Queries on Wikidata SPARQL queries are used to retrieve answers from the Wikidata SPARQL endpoint[5]. Since Wikidata is actively being updated, the gold SPARQL can be easily re-executed to acquire up-to-date answers, allowing the benchmark to compare with forthcoming iterations of large language models. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv [4] https://www.wikidata.org/w/api.php? action=wbsearchentities [5] https://www.wikidata.org/wiki/ Wikidata:SPARQL_query_service

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Implementation Details of the Entity Linker and the WikiSP Semantic Parser

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Close Look at Misalignment in Pretraining Datasets

Neural Entity Linking and How JPMorgan Chase Plans to Use it

Code Book for Annotation of Diverse Cross-Document Coreference: Abstract and Intro

Diverse Cross-document Coreference and Media Bias Analysis

Code Book for Annotation of Diverse Cross-Document Coreference: Annotation Tool

Code Book for Annotation of Diverse Cross-Document Coreference: Acknowledgements

A Close Look at Misalignment in Pretraining Datasets

Neural Entity Linking and How JPMorgan Chase Plans to Use it

Code Book for Annotation of Diverse Cross-Document Coreference: Abstract and Intro

Diverse Cross-document Coreference and Media Bias Analysis

Code Book for Annotation of Diverse Cross-Document Coreference: Annotation Tool

Code Book for Annotation of Diverse Cross-Document Coreference: Acknowledgements

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps