paint-brush
Human Preferences Help Scientists Train AI 30x Faster Than Beforeby@languagemodels
New Story

Human Preferences Help Scientists Train AI 30x Faster Than Before

by Language ModelsDecember 3rd, 2024
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

featured image - Human Preferences Help Scientists Train AI 30x Faster Than Before
Language Models HackerNoon profile picture
  1. Abstract and Introduction
  2. Related Work
  3. Problem Definition
  4. Method
  5. Experiments
  6. Conclusion and References


A. Appendix

A.1. Full Prompts and A.2 ICPL Details

A. 3 Baseline Details

A.4 Environment Details

A.5 Proxy Human Preference

A.6 Human-in-the-Loop Preference

c

Authors:

(1) Chao Yu, Tsinghua University;

(2) Hong Lu, Tsinghua University;

(3) Jiaxuan Gao, Tsinghua University;

(4) Qixin Tan, Tsinghua University;

(5) Xinting Yang, Tsinghua University;

(6) Yu Wang, with equal advising from Tsinghua University;

(7) Yi Wu, with equal advising from Tsinghua University and the Shanghai Qi Zhi Institute;

(8) Eugene Vinitsky, with equal advising from New York University ([email protected]).


This paper is available on arxiv under CC 4.0 license.