Human Preferences Help Scientists Train AI 30x Faster Than Before

Table of Links Abstract and Introduction Related Work Problem Definition Method Experiments Conclusion and References A. Appendix A.1. Full Prompts and A.2 ICPL Details A. 3 Baseline Details A.4 Environment Details A.5 Proxy Human Preference A.6 Human-in-the-Loop Preference c Authors: (1) Chao Yu, Tsinghua University; (2) Hong Lu, Tsinghua University; (3) Jiaxuan Gao, Tsinghua University; (4) Qixin Tan, Tsinghua University; (5) Xinting Yang, Tsinghua University; (6) Yu Wang, with equal advising from Tsinghua University; (7) Yi Wu, with equal advising from Tsinghua University and the Shanghai Qi Zhi Institute; (8) Eugene Vinitsky, with equal advising from New York University (zoeyuchao@gmail.com). This paper is available on arxiv under CC 4.0 license. Table of Links Abstract and Introduction Related Work Problem Definition Method Experiments Conclusion and References Abstract and Introduction Abstract and Introduction Related Work Related Work Problem Definition Problem Definition Method Method Experiments Experiments Conclusion and References Conclusion and References A. Appendix A. Appendix A.1. Full Prompts and A.2 ICPL Details A.1. Full Prompts and A.2 ICPL Details A. 3 Baseline Details A. 3 Baseline Details A.4 Environment Details A.4 Environment Details A.5 Proxy Human Preference A.5 Proxy Human Preference A.6 Human-in-the-Loop Preference A.6 Human-in-the-Loop Preference c Authors: (1) Chao Yu, Tsinghua University; (2) Hong Lu, Tsinghua University; (3) Jiaxuan Gao, Tsinghua University; (4) Qixin Tan, Tsinghua University; (5) Xinting Yang, Tsinghua University; (6) Yu Wang, with equal advising from Tsinghua University; (7) Yi Wu, with equal advising from Tsinghua University and the Shanghai Qi Zhi Institute; (8) Eugene Vinitsky, with equal advising from New York University (zoeyuchao@gmail.com). Authors: Authors: (1) Chao Yu, Tsinghua University; (2) Hong Lu, Tsinghua University; (3) Jiaxuan Gao, Tsinghua University; (4) Qixin Tan, Tsinghua University; (5) Xinting Yang, Tsinghua University; (6) Yu Wang, with equal advising from Tsinghua University; (7) Yi Wu, with equal advising from Tsinghua University and the Shanghai Qi Zhi Institute; (8) Eugene Vinitsky, with equal advising from New York University ( zoeyuchao@gmail.com ). zoeyuchao@gmail.com This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv