Table of Links Abstract and 1 Introduction 2. Background 2.1 Effective Tutoring Practice 2.2 Feedback for Tutor Training 2.3 Sequence Labeling for Feedback Generation 2.4 Large Language Models in Education 3. Method 3.1 Dataset and 3.2 Sequence Labeling 3.3 GPT Facilitated Sequence Labeling 3.4 Metrics 4. Results 4.1 Results on RQ1 4.2 Results on RQ2 5. Discussion 6. Limitation and Future Works 7. Conclusion 8. Acknowledgments 9. References APPENDIX A. Lesson Principles B. Input for Fine-Tunning GPT-3.5 C. Scatter Matric of the Correlation on the Outcome-based Praise D. Detailed Results of Fine-Tuned GPT-3.5 Model's Performance 7. CONCLUSION In this study, we investigated the enhancement of automated feedback systems through the application of GPT models, employing a multifaceted approach that included the utilization of prompting GPT-3.5 and GPT-4 models and finetuning GPT-3.5 models for improved performance. Prompting GPT models demonstrated their potential in guiding models to identify specific components of praise, emphasizing the critical role of prompt design in optimizing model outputs. In comparison, fine-tuning the GPT-3.5 model, in particular, significantly enhanced the system’s ability to accurately highlight key components from tutor responses. This led to the development of an automated feedback system aimed at delivering immediate and explanatory feedback for tutor training, addressing the crucial need for scalable and effective feedback. Our implementation showcases the potential of leveraging advanced large language models to provide highlighting explanatory feedback on tutors’ open-ended responses, offering insights for future research in the development of automated feedback systems. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Jionghao Lin, Carnegie Mellon University (jionghal@cs.cmu.edu); (2) Eason Chen, Carnegie Mellon University (easonc13@cmu.edu); (3) Zeifei Han, University of Toronto (feifei.han@mail.utoronto.ca); (4) Ashish Gurung, Carnegie Mellon University (agurung@andrew.cmu.edu); (5) Danielle R. Thomas, Carnegie Mellon University (drthomas@cmu.edu); (6) Wei Tan, Monash University (wei.tan2@monash.edu); (7) Ngoc Dang Nguyen, Monash University (dan.nguyen2@monash.edu); (8) Kenneth R. Koedinger, Carnegie Mellon University (koedinger@cmu.edu). Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2. Background 2.1 Effective Tutoring Practice 2.1 Effective Tutoring Practice 2.2 Feedback for Tutor Training 2.2 Feedback for Tutor Training 2.3 Sequence Labeling for Feedback Generation 2.3 Sequence Labeling for Feedback Generation 2.4 Large Language Models in Education 2.4 Large Language Models in Education 3. Method 3.1 Dataset and 3.2 Sequence Labeling 3.1 Dataset and 3.2 Sequence Labeling 3.3 GPT Facilitated Sequence Labeling 3.3 GPT Facilitated Sequence Labeling 3.4 Metrics 3.4 Metrics 4. Results 4.1 Results on RQ1 4.1 Results on RQ1 4.2 Results on RQ2 4.2 Results on RQ2 5. Discussion 5. Discussion 6. Limitation and Future Works 6. Limitation and Future Works 7. Conclusion 7. Conclusion 8. Acknowledgments 8. Acknowledgments 9. References 9. References APPENDIX APPENDIX A. Lesson Principles A. Lesson Principles B. Input for Fine-Tunning GPT-3.5 B. Input for Fine-Tunning GPT-3.5 C. Scatter Matric of the Correlation on the Outcome-based Praise C. Scatter Matric of the Correlation on the Outcome-based Praise D. Detailed Results of Fine-Tuned GPT-3.5 Model's Performance D. Detailed Results of Fine-Tuned GPT-3.5 Model's Performance 7. CONCLUSION In this study, we investigated the enhancement of automated feedback systems through the application of GPT models, employing a multifaceted approach that included the utilization of prompting GPT-3.5 and GPT-4 models and finetuning GPT-3.5 models for improved performance. Prompting GPT models demonstrated their potential in guiding models to identify specific components of praise, emphasizing the critical role of prompt design in optimizing model outputs. In comparison, fine-tuning the GPT-3.5 model, in particular, significantly enhanced the system’s ability to accurately highlight key components from tutor responses. This led to the development of an automated feedback system aimed at delivering immediate and explanatory feedback for tutor training, addressing the crucial need for scalable and effective feedback. Our implementation showcases the potential of leveraging advanced large language models to provide highlighting explanatory feedback on tutors’ open-ended responses, offering insights for future research in the development of automated feedback systems. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv Authors: (1) Jionghao Lin, Carnegie Mellon University (jionghal@cs.cmu.edu); (2) Eason Chen, Carnegie Mellon University (easonc13@cmu.edu); (3) Zeifei Han, University of Toronto (feifei.han@mail.utoronto.ca); (4) Ashish Gurung, Carnegie Mellon University (agurung@andrew.cmu.edu); (5) Danielle R. Thomas, Carnegie Mellon University (drthomas@cmu.edu); (6) Wei Tan, Monash University (wei.tan2@monash.edu); (7) Ngoc Dang Nguyen, Monash University (dan.nguyen2@monash.edu); (8) Kenneth R. Koedinger, Carnegie Mellon University (koedinger@cmu.edu). Authors: Authors: (1) Jionghao Lin, Carnegie Mellon University (jionghal@cs.cmu.edu); (2) Eason Chen, Carnegie Mellon University (easonc13@cmu.edu); (3) Zeifei Han, University of Toronto (feifei.han@mail.utoronto.ca); (4) Ashish Gurung, Carnegie Mellon University (agurung@andrew.cmu.edu); (5) Danielle R. Thomas, Carnegie Mellon University (drthomas@cmu.edu); (6) Wei Tan, Monash University (wei.tan2@monash.edu); (7) Ngoc Dang Nguyen, Monash University (dan.nguyen2@monash.edu); (8) Kenneth R. Koedinger, Carnegie Mellon University (koedinger@cmu.edu).