paint-brush
Orca 2: Enhancing Reasoning in Smaller Language Models - Evaluation of Safetyby@textmodels

Orca 2: Enhancing Reasoning in Smaller Language Models - Evaluation of Safety

tldt arrow

Too Long; Didn't Read

This paper is available on arxiv.org under the CC 4.0 license. We present results for each of the target identity groups in ToxiGen dataset in the discriminative evaluation regime. We also describe more details and provide further results regarding the experiments presented in section 6.6.1.
featured image - Orca 2: Enhancing Reasoning in Smaller Language Models - Evaluation of Safety
Writings, Papers and Blogs on Text Models HackerNoon profile picture

Authors:

(1) Arindam Mitra;

(2) Luciano Del Corro, work done while at Microsoft;

(3) Shweti Mahajan, work done while at Microsoft;

(4) Andres Codas, denote equal contributions;

(5) Clarisse Simoes, denote equal contributions;

(6) Sahaj Agarwal;

(7) Xuxi Chen, work done while at Microsoft;;

(8) Anastasia Razdaibiedina, work done while at Microsoft;

(9) Erik Jones, work done while at Microsoft;

(10) Kriti Aggarwal, work done while at Microsoft;

(11) Hamid Palangi;

(12) Guoqing Zheng;

(13) Corby Rosset;

(14) Hamed Khanpour;

(15) Ahmed Awadall.

Abstract and Introduction

Preliminaries

Teaching Orca 2 to be a Cautious Reasoner

Technical Details

Experimental Setup

Evaluation Results

Limitations

Conclusions and References

A. AGIEval Subtask Metrics

B. BigBench-Hard Subtask Metrics

C. Evaluation of Grounding in Abstractive Summarization

D. Evaluation of Safety

E. Prompts used in Evaluation

F. Illustrative Example from Evaluation Benchmarks and Corresponding Model Outpu

D Evaluation of Safety

In this section we describe more details and provide further results regarding the experiments presented in section 6.6.

D.1 ToxiGen MCQ

In this section we present results for each of the target identity groups in ToxiGen dataset in the discriminative evaluation regime which are a breakdown of the aggregated results presented in section 6.6.


Table 13: Neutral Statement Classification


Table 14: Toxic Statement Classification


This paper is available on arxiv under CC 4.0 license.