This story draft by @escholar has not been reviewed by an editor, YET.
Authors:
(1) Soyeong Jeong, School of Computing;
(2) Jinheon Baek, Graduate School of AI;
(3) Sukmin Cho, School of Computing;
(4) Sung Ju Hwang, Korea Advanced Institute of Science and Technology;
(5) Jong C. Park, School of Computing.
3 Method and 3.1 Preliminaries
3.2 Adaptive-RAG: Adaptive Retrieval-Augmented Generation
4 Experimental Setups and 4.1 Datasets
4.2 Models and 4.3 Evaluation Metrics
5 Experimental Results and Analyses
6 Conclusion, Limitations, Ethics Statement, Acknowledgements, and References
A Additional Experimental Setups
B Additional Experimental Results
In this section, we explain datasets, models, metrics, and implementation details. We provide additional details in Appendix A.
In order to simulate a realistic scenario, where different queries have varying complexities, we use both the single-hop and multi-hop QA datasets simultaneously, in the unified experimental setting.
Single-hop QA For simpler queries, we use three benchmark single-hop QA datasets, which consist of queries and their associated documents containing answers, namely 1) SQuAD v1.1 (Rajpurkar et al., 2016), 2) Natural Questions (Kwiatkowski et al., 2019), and 3) TriviaQA (Joshi et al., 2017).
Multi-hop QA To consider more complex query scenarios, we use three benchmark multi-hop QA datasets, which require sequential reasoning over multiple documents, namely 1) MuSiQue (Trivedi et al., 2022a), 2) HotpotQA (Yang et al., 2018), and 3) 2WikiMultiHopQA (Ho et al., 2020).
This paper is available on arxiv under CC0 1.0 DEED license.