Zephyr: Direct Distillation of LM Alignment: Appendix

Authors: (1) Lewis Tunstall, Equal contribution and The H4 (Helpful, Honest, Harmless, Huggy) Team (email: lewis@huggingface.co); (2) Edward Beeching, Equal contribution and The H4 (Helpful, Honest, Harmless, Huggy) Team; (3) Nathan Lambert, The H4 (Helpful, Honest, Harmless, Huggy) Team; (4) Nazneen Rajani, The H4 (Helpful, Honest, Harmless, Huggy) Team; (5) Kashif Rasul, The H4 (Helpful, Honest, Harmless, Huggy) Team; (6) Younes Belkada, The H4 (Helpful, Honest, Harmless, Huggy) Team; (7) Shengyi Huang, The H4 (Helpful, Honest, Harmless, Huggy) Team; (8) Leandro von Werra, The H4 (Helpful, Honest, Harmless, Huggy) Team; (9) Clementine Fourrier, The H4 (Helpful, Honest, Harmless, Huggy) Team; (10) Nathan Habib, The H4 (Helpful, Honest, Harmless, Huggy) Team; (11) Nathan Sarrazin, The H4 (Helpful, Honest, Harmless, Huggy) Team; (12) Omar Sanseviero, The H4 (Helpful, Honest, Harmless, Huggy) Team; (13) Alexander M. Rush, The H4 (Helpful, Honest, Harmless, Huggy) Team; (14) Thomas Wolf, The H4 (Helpful, Honest, Harmless, Huggy) Team. Table of Links Abstract and Introduction Related Work Method Experimental Details Results and Ablations Conclusions and Limitations , Acknowledgements and References Appendix A APPENDIX A.1 QUALITATIVE EXAMPLES To qualitatively compare the responses from our dSFT and dDPO models, we choose prompts from a few domains of MT-Bench, as well as some adversarial prompts to test each model’s capability to follow instructions with false premises or harmful intent. Completions for the adversarial prompts were generated with nucleus sampling(top-p = 0.95) and T = 0.7. A.2 SFT IS A REQUIRED STEP BEFORE DPO In Table 3 we ran an ablation to see whether SFT is necessary prior to the DPO step. We observed a significant reduction in performance in both the MT-Bench and AlpacaEval scores when the SFT step is skipped. After a qualitative evaluation of the MT-Bench generations, we observe that the pure DPO model struggles to learn the chat template: This paper is available on arxiv under CC 4.0 license. Authors: (1) Lewis Tunstall, Equal contribution and The H4 (Helpful, Honest, Harmless, Huggy) Team (email: lewis@huggingface.co); (2) Edward Beeching, Equal contribution and The H4 (Helpful, Honest, Harmless, Huggy) Team; (3) Nathan Lambert, The H4 (Helpful, Honest, Harmless, Huggy) Team; (4) Nazneen Rajani, The H4 (Helpful, Honest, Harmless, Huggy) Team; (5) Kashif Rasul, The H4 (Helpful, Honest, Harmless, Huggy) Team; (6) Younes Belkada, The H4 (Helpful, Honest, Harmless, Huggy) Team; (7) Shengyi Huang, The H4 (Helpful, Honest, Harmless, Huggy) Team; (8) Leandro von Werra, The H4 (Helpful, Honest, Harmless, Huggy) Team; (9) Clementine Fourrier, The H4 (Helpful, Honest, Harmless, Huggy) Team; (10) Nathan Habib, The H4 (Helpful, Honest, Harmless, Huggy) Team; (11) Nathan Sarrazin, The H4 (Helpful, Honest, Harmless, Huggy) Team; (12) Omar Sanseviero, The H4 (Helpful, Honest, Harmless, Huggy) Team; (13) Alexander M. Rush, The H4 (Helpful, Honest, Harmless, Huggy) Team; (14) Thomas Wolf, The H4 (Helpful, Honest, Harmless, Huggy) Team. Authors: Authors: (1) Lewis Tunstall, Equal contribution and The H4 (Helpful, Honest, Harmless, Huggy) Team (email: lewis@huggingface.co); (2) Edward Beeching, Equal contribution and The H4 (Helpful, Honest, Harmless, Huggy) Team; (3) Nathan Lambert, The H4 (Helpful, Honest, Harmless, Huggy) Team; (4) Nazneen Rajani, The H4 (Helpful, Honest, Harmless, Huggy) Team; (5) Kashif Rasul, The H4 (Helpful, Honest, Harmless, Huggy) Team; (6) Younes Belkada, The H4 (Helpful, Honest, Harmless, Huggy) Team; (7) Shengyi Huang, The H4 (Helpful, Honest, Harmless, Huggy) Team; (8) Leandro von Werra, The H4 (Helpful, Honest, Harmless, Huggy) Team; (9) Clementine Fourrier, The H4 (Helpful, Honest, Harmless, Huggy) Team; (10) Nathan Habib, The H4 (Helpful, Honest, Harmless, Huggy) Team; (11) Nathan Sarrazin, The H4 (Helpful, Honest, Harmless, Huggy) Team; (12) Omar Sanseviero, The H4 (Helpful, Honest, Harmless, Huggy) Team; (13) Alexander M. Rush, The H4 (Helpful, Honest, Harmless, Huggy) Team; (14) Thomas Wolf, The H4 (Helpful, Honest, Harmless, Huggy) Team. Table of Links Abstract and Introduction Related Work Method Experimental Details Results and Ablations Conclusions and Limitations , Acknowledgements and References Appendix Abstract and Introduction Abstract and Introduction Related Work Related Work Method Method Experimental Details Experimental Details Results and Ablations Results and Ablations Conclusions and Limitations , Acknowledgements and References Conclusions and Limitations , Acknowledgements and References Appendix Appendix A APPENDIX A.1 QUALITATIVE EXAMPLES To qualitatively compare the responses from our dSFT and dDPO models, we choose prompts from a few domains of MT-Bench, as well as some adversarial prompts to test each model’s capability to follow instructions with false premises or harmful intent. Completions for the adversarial prompts were generated with nucleus sampling(top-p = 0.95) and T = 0.7. A.2 SFT IS A REQUIRED STEP BEFORE DPO In Table 3 we ran an ablation to see whether SFT is necessary prior to the DPO step. We observed a significant reduction in performance in both the MT-Bench and AlpacaEval scores when the SFT step is skipped. After a qualitative evaluation of the MT-Bench generations, we observe that the pure DPO model struggles to learn the chat template: This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv