paint-brush

This story draft by @textmodels has not been reviewed by an editor, YET.

GPT-4-Turbo, GPT4, and GPT-3.5-Turbo: Their Benchmark Scores

featured image - GPT-4-Turbo, GPT4, and GPT-3.5-Turbo: Their Benchmark Scores
Writings, Papers and Blogs on Text Models HackerNoon profile picture
0-item

Authors:

(1) Gladys Tyen, University of Cambridge, Dept. of Computer Science & Technology, ALTA Institute, and Work done during an internship at Google Research (e-mail: [email protected]);

(2) Hassan Mansoor, Google Research (e-mail: [email protected]);

(3) Victor Carbune, Google Research (e-mail: [email protected]);

(4) Peter Chen, Google Research and Equal leadership contribution ([email protected]);

(5) Tony Mak, Google Research and Equal leadership contribution (e-mail: [email protected]).

Table of Links

Abstract and Introduction

BIG-Bench Mistake

Benchmark results

Backtracking

Related Works

Conclusion, Limitations, and References

A. Implementational details

B. Annotation

C. Benchmark scores

C Benchmark scores

Table 8: Mistake finding accuracy across 5 tasks for correctmis and incorrectmis traces. The combined scores of Table 8a and Table 8b make up Table 4.


Figure 3: Screenshot of the user interface for a question from the tracking shuffled objects task.


This paper is available on arxiv under CC 4.0 license.


L O A D I N G
. . . comments & more!

About Author

Writings, Papers and Blogs on Text Models HackerNoon profile picture
Writings, Papers and Blogs on Text Models@textmodels
We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

Topics

Around The Web...