Authors:
(1) Soyeong Jeong, School of Computing;
(2) Jinheon Baek, Graduate School of AI;
(3) Sukmin Cho, School of Computing;
(4) Sung Ju Hwang, Korea Advanced Institute of Science and Technology;
(5) Jong C. Park, School of Computing.
3 Method and 3.1 Preliminaries
3.2 Adaptive-RAG: Adaptive Retrieval-Augmented Generation
4 Experimental Setups and 4.1 Datasets
4.2 Models and 4.3 Evaluation Metrics
5 Experimental Results and Analyses
6 Conclusion, Limitations, Ethics Statement, Acknowledgements, and References
A Additional Experimental Setups
B Additional Experimental Results
Retrieval-Augmented Large Language Models (LLMs), which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA). However, even though there are various approaches dealing with queries of different complexities, they either handle simple queries with unnecessary computational overhead or fail to adequately address complex multi-step queries; yet, not all user requests fall into only one of the simple or complex categories. In this work, we propose a novel adaptive QA framework that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs from the simplest to the most sophisticated ones based on the query complexity. Also, this selection process is operationalized with a classifier, which is a smaller LM trained to predict the complexity level of incoming queries with automatically collected labels, obtained from actual predicted outcomes of models and inherent inductive biases in datasets. This approach offers a balanced strategy, seamlessly adapting between the iterative and single-step retrieval-augmented LLMs, as well as the noretrieval methods, in response to a range of query complexities. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems, compared to relevant baselines including the adaptive retrieval approaches. Code is available at: https:// github.com/starsuzi/Adaptive-RAG.
Recent Large Language Models (LLMs) (Brown et al., 2020; OpenAI, 2023; Touvron et al., 2023; Anil et al., 2023) have shown overwhelming performances across diverse tasks, including question
answering (QA) (Yang et al., 2018; Kwiatkowski et al., 2019). However, they still generate factually incorrect answers since their knowledge solely relies on their parametric memory (Kasai et al., 2022; Mallen et al., 2023). Meanwhile, memorizing all the (ever-changing) world knowledge may not be possible. To address this problem, retrieval augmented LLMs (Borgeaud et al., 2022; Izacard et al., 2023; Shi et al., 2023), which incorporate non-parametric knowledge into LLMs with additional retrieval modules, have gained much increasing attention. Specifically, these models access a knowledge base, which serves as an extensive repository of information across various subjects and disciplines, to retrieve information relevant to the given input, and then incorporate the retrieved information into LLMs, which enables them to stay accurate and current with the world knowledge.
A particularly salient application of retrieval augmented LLMs is to handling QA tasks, whose goal is to provide correct answers in response to user queries, especially those of high complexity. Early work on retrieval-augmented LLMs focuses primarily on single-hop queries (Lazaridou et al., 2022; Ram et al., 2023), whose answers are typically found within a single document; therefore, this approach involves retrieving a relevant document based on the query and subsequently integrating this information into QA models to formulate a response. However, unlike this single-hop QA, some queries require connecting and aggregating multiple documents, which are, furthermore,
often not answerable through a single-step process of retrieval-and-response. An example query is ‘When did the people who captured Malakoff come to the region where Philipsburg is located?’, which requires four reasoning steps to solve. Therefore, to effectively handle such complex queries, recent studies have concentrated largely on multistep and multi-reasoning QA, which requires iterative accesses to both LLMs and retrievers multiple times (Press et al., 2023; Trivedi et al., 2023), at the cost of heavy computational overheads.
Yet, we should rethink: In a real-world scenario, are all the requests from users complex? Instead, users might often ask simple and straightforward questions, while only occasionally asking complex ones. Specifically, a query such as ‘Paris is the capital of what?’ is likely to be asked more frequently, compared to the aforementioned multistep query, and this simpler query might also be easily answered by the LLMs themselves, without accessing external knowledge. In other words, a multi-step QA approach could give rise to unnecessary computational overhead for simple queries, even though it would be vital for complex queries (see Figure 2 (A)). On the other hand, handling complex queries with single-step-retrieval or even non-retrieval strategies would be largely insufficient (Figure 2 (B)). This suggests the need for an adaptive QA system, which can dynamically adjust the operational strategies of retrieval-augmented LLMs based on the query complexity. While some recent approaches are capable of doing this based on the frequency of entities in queries (Mallen et al., 2023) or on the generated outputs from models for multi-step QA (Trivedi et al., 2023), they are still suboptimal: the former methods are overly simplistic, failing to consider multi-hop queries; meanwhile, the latter are excessively complex, terminating answer solving steps after several rounds of module access.
In this work, considering diverse complexity levels of real-world queries, we argue that previous one-size-fits-all approaches might be inadequate to cover all of them. Instead, we propose to select the most suitable strategy from a range of (retrieval augmented) LLMs, each of which is tailored to the specific complexity of the input query. Notably, a critical step in this process is pre-defining the query complexity, which is instrumental in determining the most fitting model to it. In this work, we operationalize this process with a novel classifier, which is a smaller model trained to predict the complexity level of incoming queries (see Figure 2 (c)). Moreover, we automatically collect its training datasets without human labeling, by leveraging the predicted outcomes (i.e., which models accurately respond to which queries) as well as by capitalizing on the inherent biases in existing datasets (i.e., samples in the datasets are designed either for single step or for multi-step QA scenarios). This proposed method can offer a robust middle ground among the iterative LLM augmentation methods for complex queries, single-step methods for simpler queries, and even no-retrieval-augmented methods for the most straightforward queries (answerable by LLMs themselves), thus significantly enhancing the overall efficiency and accuracy, as shown in Figure 1. We refer to our framework as Adaptive Retrieval Augmented Generation (Adaptive-RAG).
We validate Adaptive-RAG using benchmark open-domain QA datasets, covering a wide range of query complexity from single-hop (Rajpurkar et al., 2016; Joshi et al., 2017; Kwiatkowski et al., 2019) to multi-hop (Yang et al., 2018; Ho et al., 2020; Trivedi et al., 2022b) queries. The experimental results show that ours significantly improves the overall accuracy and efficiency, compared to the prior adaptive strategies, on multiple LLMs, such as GPT-3.5 (Brown et al., 2020) and FLAN-T5 series (Chung et al., 2022).
Our contributions and findings are threefold:
• We point out the realistic scenario of queries of varying complexities, and find out that existing retrieval-augmented generation approaches tend to be overly simple or complex.
• We adapt retrieval-augmented LLMs to the query complexity assessed by the classifier, which enables the utilization of the most suitable approach tailored to each query.
• We show that our Adaptive-RAG is highly effective and efficient, balancing between the complexity and the simplicity for diverse queries.
This paper is available on arxiv under CC0 1.0 DEED license.