Listen to this story
We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.
Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.
Authors:
(1) Arindam Mitra;
(2) Luciano Del Corro, work done while at Microsoft;
(3) Shweti Mahajan, work done while at Microsoft;
(4) Andres Codas, denote equal contributions;
(5) Clarisse Simoes, denote equal contributions;
(6) Sahaj Agarwal;
(7) Xuxi Chen, work done while at Microsoft;;
(8) Anastasia Razdaibiedina, work done while at Microsoft;
(9) Erik Jones, work done while at Microsoft;
(10) Kriti Aggarwal, work done while at Microsoft;
(11) Hamid Palangi;
(12) Guoqing Zheng;
(13) Corby Rosset;
(14) Hamed Khanpour;
(15) Ahmed Awadall.
Teaching Orca 2 to be a Cautious Reasoner
B. BigBench-Hard Subtask Metrics
C. Evaluation of Grounding in Abstractive Summarization
F. Illustrative Example from Evaluation Benchmarks and Corresponding Model Outpu
We provide a list of prompts used for evaluation below:
Table 15: Table describes the prompts used for evaluating all models with empty. The prompts are simple and only aim at giving the models hints about answer format to improve the parsing of model responses. For tasks, where the question were formatted as a prompt, the input is used as is. Examples from all datasets are shown in Appendix F
This paper is available on arxiv under CC 4.0 license.