paint-brush

This story draft by @textmodels has not been reviewed by an editor, YET.

GPT-4-Turbo, GPT4, and GPT-3.5-Turbo: Their Benchmark Scores

featured image - GPT-4-Turbo, GPT4, and GPT-3.5-Turbo: Their Benchmark Scores
Writings, Papers and Blogs on Text Models HackerNoon profile picture
Writings, Papers and Blogs on Text Models

Writings, Papers and Blogs on Text Models

@textmodels

We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

undefined @textmodels
LEARN MORE ABOUT @TEXTMODELS'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item

STORY’S CREDIBILITY

Academic Research Paper

Academic Research Paper

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Authors:

(1) Gladys Tyen, University of Cambridge, Dept. of Computer Science & Technology, ALTA Institute, and Work done during an internship at Google Research (e-mail: gladys.tyen@cl.cam.ac.uk);

(2) Hassan Mansoor, Google Research (e-mail: hassan@google.com);

(3) Victor Carbune, Google Research (e-mail: vcarbune@google.com);

(4) Peter Chen, Google Research and Equal leadership contribution (chenfeif@google.com);

(5) Tony Mak, Google Research and Equal leadership contribution (e-mail: tonymak@google.com).

Table of Links

Abstract and Introduction

BIG-Bench Mistake

Benchmark results

Backtracking

Related Works

Conclusion, Limitations, and References

A. Implementational details

B. Annotation

C. Benchmark scores

C Benchmark scores

Table 8: Mistake finding accuracy across 5 tasks for correctmis and incorrectmis traces. The combined scores of Table 8a and Table 8b make up Table 4.

Table 8: Mistake finding accuracy across 5 tasks for correctmis and incorrectmis traces. The combined scores of Table 8a and Table 8b make up Table 4.


Figure 3: Screenshot of the user interface for a question from the tracking shuffled objects task.

Figure 3: Screenshot of the user interface for a question from the tracking shuffled objects task.


This paper is available on arxiv under CC 4.0 license.


L O A D I N G
. . . comments & more!

About Author

Writings, Papers and Blogs on Text Models HackerNoon profile picture
Writings, Papers and Blogs on Text Models@textmodels
We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

Topics

Around The Web...

Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
X REMOVE AD