This story draft by @escholar has not been reviewed by an editor, YET.
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community
Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.
Authors:
(1) Soyeong Jeong, School of Computing;
(2) Jinheon Baek, Graduate School of AI;
(3) Sukmin Cho, School of Computing;
(4) Sung Ju Hwang, Korea Advanced Institute of Science and Technology;
(5) Jong C. Park, School of Computing.
3 Method and 3.1 Preliminaries
3.2 Adaptive-RAG: Adaptive Retrieval-Augmented Generation
4 Experimental Setups and 4.1 Datasets
4.2 Models and 4.3 Evaluation Metrics
5 Experimental Results and Analyses
6 Conclusion, Limitations, Ethics Statement, Acknowledgements, and References
A Additional Experimental Setups
B Additional Experimental Results
Figure 4: QA performance (F1) and efficiency (Time/Query) for different retrieval-augmented generation approaches. We use the FLAN-T5-XL (3B) as the base LLM.
We use publicly open datasets for both singlehop and multi-hop QA datasets, referring to as Karpukhin et al. (2020) and Trivedi et al. (2023), respectively. We describe the characteristics of each dataset:
SQuAD v1.1 (Rajpurkar et al., 2016) is created through a process where annotators write questions based on the documents they read.
Natural Questions (Kwiatkowski et al., 2019) is constructed by real user queries on Google Search.
TriviaQA (Joshi et al., 2017) comprises trivia questions sourced from various quiz websites.
MuSiQue (Trivedi et al., 2022a) is collected by compositing multiple single-hop queries, to form queries spanning 2-4 hops.
HotpotQA (Yang et al., 2018) is constructed by having annotators create questions that link multiple Wikipedia articles.
2WikiMultiHopQA (Ho et al., 2020) is derived from Wikipedia and its associated knowledge graph path, needing 2-hops.
We describe the details of models as follows:
No Retrieval. This approach uses only the LLM itself, to generate the answer to the given query.
Single-step Approach. This approach first retrieves the relevant knowledge with the given query from the external knowledge sources and then augments the LLM with this retrieved knowledge to generate the answer, which iterates only once.
Adaptive Retrieval. This baseline (Mallen et al., adaptively augments the LLM with the retrieval module, only when the entities appearing in queries are less popular. To extract entities, we use the available entity-linking method (Li et al., 2020), namely BLINK, for questions.
Self-RAG. This baseline (Asai et al., 2024) trains the LLM to adaptively perform retrieval and generation, where the retrieval is conducted once it predicts the special retrieval token above a certain threshold, and the answer generation follows.
Adaptive-RAG. This is our model that adaptively selects the retrieval-augmented generation strategy, smoothly oscillating between the non retrieval, single-step approach, and multi-step approaches[4] without architectural changes, based on the query complexity assessed by the classifier.
Multi-step Approach. This approach (Trivedi et al., 2023) is the multi-step retrieval-augmented LLM, which iteratively accesses both the retriever and LLM with interleaved Chain-of-Thought reasoning (Wei et al., 2022b) repeatedly until it derives the solution or reaches the maximum step number.
Adaptive-RAG w/ Oracle This is an ideal scenario of our Adaptive-RAG equipped with an oracle classifier that perfectly categorizes the query complexity.
For computing resources, we use A100 GPUs with 80GB memory. In addition, due to the significant costs associated with evaluating retrievalaugmented generation models, we perform experiments with a single run. Finally, we implemented models using PyTorch (Paszke et al., 2019) and Transformers library (Wolf et al., 2020).
This paper is available on arxiv under CC0 1.0 DEED license.
[4] For the multi-step approach, we use the state-of-the-art question answering strategy from IRCoT (Trivedi et al., 2023).