paint-brush
Using LLMs to Correct Reasoning Mistakes: Related Works That You Should Know Aboutby@textmodels
153 reads

Using LLMs to Correct Reasoning Mistakes: Related Works That You Should Know About

by Writings, Papers and Blogs on Text Models
Writings, Papers and Blogs on Text Models HackerNoon profile picture

Writings, Papers and Blogs on Text Models

@textmodels

We publish the best academic papers on rule-based techniques, LLMs,...

June 1st, 2024
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This paper explores few-shot in-context learning methods, which is typically used in realworld applications with API-based LLMs. Our paper focuses on correction of logical and reasoning errors, rather than stylistic or qualitative improvements. To our knowledge, the only publicly available dataset containing mistake annotations in LLM outputs is PRM800K (Lightman et al., 2023)
featured image - Using LLMs to Correct Reasoning Mistakes: Related Works That You Should Know About
1x
Read by Dr. One voice-avatar

Listen to this story

Writings, Papers and Blogs on Text Models HackerNoon profile picture
Writings, Papers and Blogs on Text Models

Writings, Papers and Blogs on Text Models

@textmodels

We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

About @textmodels
LEARN MORE ABOUT @TEXTMODELS'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item

STORY’S CREDIBILITY

Academic Research Paper

Academic Research Paper

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Authors:

(1) Gladys Tyen, University of Cambridge, Dept. of Computer Science & Technology, ALTA Institute, and Work done during an internship at Google Research (e-mail: gladys.tyen@cl.cam.ac.uk);

(2) Hassan Mansoor, Google Research (e-mail: hassan@google.com);

(3) Victor Carbune, Google Research (e-mail: vcarbune@google.com);

(4) Peter Chen, Google Research and Equal leadership contribution (chenfeif@google.com);

(5) Tony Mak, Google Research and Equal leadership contribution (e-mail: tonymak@google.com).

Abstract and Introduction

BIG-Bench Mistake

Benchmark results

Backtracking

Related Works

Conclusion, Limitations, and References

A. Implementational details

B. Annotation

C. Benchmark scores

Datasets To our knowledge, the only publicly available dataset containing mistake annotations in LLM outputs is PRM800K (Lightman et al., 2023), which is a dataset of solutions to Olympiad-level math questions. Our dataset BIG-Bench Mistake covers a wider range of tasks to explore the reasoning capabilities of LLMs more thoroughly.


Additionally, the generator LLM used in PRM800K has been fine-tuned on 1.5B math tokens as well as a dataset of step-by-step math solutions. For this paper, we wanted to explore few-shot in-context learning methods, which is typically used in realworld applications with API-based LLMs.


Self-correction Pan et al. (2023) present a plethora of self-correction methods in recent literature. While their list includes training-time correction strategies such as RLHF (Ouyang et al., 2022) and self-improve (Huang et al., 2022), our backtracking method falls into the category of post-hoc correction, where the correction process is applied to outputs that have already been generated.


Our paper focuses on correction of logical and reasoning errors, rather than stylistic or qualitative improvements. Previous post-hoc correction methods that are applied to reasoning errors include Reflexion (Shinn et al., 2023) and RCI (Kim et al., 2023), both of which cause performance deterioration when the oracle label is not used (Huang et al., 2023). Other methods such as Self-Refine (Madaan et al., 2023) and iterative refinement (Chen et al., 2023) focus on q ualitative or stylistic improvements rather than correcting logical errors.


This paper is available on arxiv under CC 4.0 license.


L O A D I N G
. . . comments & more!

About Author

Writings, Papers and Blogs on Text Models HackerNoon profile picture
Writings, Papers and Blogs on Text Models@textmodels
We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Thetechstreetnow
Hk
X REMOVE AD