160 reads

Our Annotations Guide for BIG-Bench Mistake

by Writings, Papers and Blogs on Text ModelsJune 1st, 2024

Too Long; Didn't Read

Annotators can click on words to highlight the same word across the trace and the question text. Buttons on the right automatically become inactive if a previous step has been labelled as negative. For every trace, we provide the input question as well as the target answer, with a note to be aware of errors that may occur in correctans traces.

featured image - Our Annotations Guide for BIG-Bench Mistake

‘a big list of rules’ Image created by HackerNoon AI Image Generator

Read by Dr. One voice-avatar

Listen to this story

Authors:

(1) Gladys Tyen, University of Cambridge, Dept. of Computer Science & Technology, ALTA Institute, and Work done during an internship at Google Research (e-mail: gladys.tyen@cl.cam.ac.uk);

(2) Hassan Mansoor, Google Research (e-mail: hassan@google.com);

(3) Victor Carbune, Google Research (e-mail: vcarbune@google.com);

(4) Peter Chen, Google Research and Equal leadership contribution (chenfeif@google.com);

(5) Tony Mak, Google Research and Equal leadership contribution (e-mail: tonymak@google.com).

Table of Links

Abstract and Introduction

Conclusion, Limitations, and References

A. Implementational details

B. Annotation

C. Benchmark scores

B Annotation

We release our annotation guidelines at https:// github.com/WHGTyen/BIG-Bench-Mistake.

During annotation of the multistep arithmetic task, we found that the first CoT step given in the original BIG-Bench Hard prompt examples (Suzgun et al., 2022) was incorrect. Since all generated traces contained the same first step, we removed that step before showing traces to the annotators. Figure 3 contains an example screenshot of the user interface. For every trace, we provide the input question as well as the target answer, with a note to be aware of errors that may occur in correctans traces.

Annotators can click on words to highlight the same word across the trace and the question text, which we found was particularly helpful for some tasks such as word sorting and tracking shuffled objects. Buttons on the right automatically become inactive if a previous step has been labelled as negative.