Authors:
(1) Chengrun Yang, Google DeepMind and Equal contribution;
(2) Xuezhi Wang, Google DeepMind;
(3) Yifeng Lu, Google DeepMind;
(4) Hanxiao Liu, Google DeepMind;
(5) Quoc V. Le, Google DeepMind;
(6) Denny Zhou, Google DeepMind;
(7) Xinyun Chen, Google DeepMind and Equal contribution. Table of Links Abstract and 1. Introduction 2 Opro: Llm as the Optimizer and 2.1 Desirables of Optimization by Llms 2.2 Meta-Prompt Design 3 Motivating Example: Mathematical Optimization and 3.1 Linear Regression 3.2 Traveling Salesman Problem (TSP) 4 Application: Prompt Optimization and 4.1 Problem Setup 4.2 Meta-Prompt Design 5 Prompt Optimization Experiments and 5.1 Evaluation Setup 5.2 Main Results 5.3 Ablation Studies 5.4 Overfitting Analysis in Prompt Optimization and 5.5 Comparison with Evoprompt 6 Related Work 7 Conclusion, Acknowledgments and References A Some Failure Cases B Prompting Formats for Scorer Llm C Meta-Prompts and C.1 Meta-Prompt for Math Optimization C.2 Meta-Prompt for Prompt Optimization D Prompt Optimization Curves on the Remaining Bbh Tasks E Prompt Optimization on Bbh Tasks – Tabulated Accuracies and Found Instructions 4 APPLICATION: PROMPT OPTIMIZATION Next, we demonstrate the effectiveness of OPRO on prompt optimization, where the objective is to find the prompt that maximizes task accuracy. We first introduce the problem setup, then illustrate the meta-prompt design. 4.1 PROBLEM SETUP We focus on prompt optimization for natural language tasks, where both the input and output are in the text format. The task is represented as a dataset with training and test splits, where the training set is used to calculate the training accuracy as the objective value during the optimization process, and we compute the test accuracy on the test set after the optimization finishes. While traditional optimization often requires a decently large training set, our experiment shows that a small number or fraction of training samples (e.g., 3.5% of the training set for GSM8K (Cobbe et al., 2021), 20% for Big-Bench Hard (Suzgun et al., 2022)) is sufficient. The objective function evaluator is an LLM to which the optimized prompt will be applied, and it can be the same or different from the LLM for optimization. We denote the LLM for objective function evaluation as the scorer LLM, and the LLM for optimization as the optimizer LLM. The output of the optimizer LLM is an instruction, which is concatenated to the question part of every exemplar and prompts the scorer LLM. We consider the following positions to insert the instruction: • Q_begin: the instruction is added before the original question. • Q_end: the instruction is added after the original question. • A_begin: the instruction is added to the beginning of the scorer LLM output. This is applicable to pretrained LLMs without instruction tuning, where the prompt is formatted as a sequence of QA pairs. We exemplify these prompting formats in Appendix B. This paper is available on arxiv under CC0 1.0 DEED license. Authors: (1) Chengrun Yang, Google DeepMind and Equal contribution; (2) Xuezhi Wang, Google DeepMind; (3) Yifeng Lu, Google DeepMind; (4) Hanxiao Liu, Google DeepMind; (5) Quoc V. Le, Google DeepMind; (6) Denny Zhou, Google DeepMind; (7) Xinyun Chen, Google DeepMind and Equal contribution. Authors: Authors: (1) Chengrun Yang, Google DeepMind and Equal contribution; (2) Xuezhi Wang, Google DeepMind; (3) Yifeng Lu, Google DeepMind; (4) Hanxiao Liu, Google DeepMind; (5) Quoc V. Le, Google DeepMind; (6) Denny Zhou, Google DeepMind; (7) Xinyun Chen, Google DeepMind and Equal contribution. Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2 Opro: Llm as the Optimizer and 2.1 Desirables of Optimization by Llms 2 Opro: Llm as the Optimizer and 2.1 Desirables of Optimization by Llms 2.2 Meta-Prompt Design 2.2 Meta-Prompt Design 3 Motivating Example: Mathematical Optimization and 3.1 Linear Regression 3 Motivating Example: Mathematical Optimization and 3.1 Linear Regression 3.2 Traveling Salesman Problem (TSP) 3.2 Traveling Salesman Problem (TSP) 4 Application: Prompt Optimization and 4.1 Problem Setup 4 Application: Prompt Optimization and 4.1 Problem Setup 4.2 Meta-Prompt Design 4.2 Meta-Prompt Design 5 Prompt Optimization Experiments and 5.1 Evaluation Setup 5 Prompt Optimization Experiments and 5.1 Evaluation Setup 5.2 Main Results 5.2 Main Results 5.3 Ablation Studies 5.3 Ablation Studies 5.4 Overfitting Analysis in Prompt Optimization and 5.5 Comparison with Evoprompt 5.4 Overfitting Analysis in Prompt Optimization and 5.5 Comparison with Evoprompt 6 Related Work 6 Related Work 7 Conclusion, Acknowledgments and References 7 Conclusion, Acknowledgments and References A Some Failure Cases A Some Failure Cases B Prompting Formats for Scorer Llm B Prompting Formats for Scorer Llm C Meta-Prompts and C.1 Meta-Prompt for Math Optimization C Meta-Prompts and C.1 Meta-Prompt for Math Optimization C.2 Meta-Prompt for Prompt Optimization C.2 Meta-Prompt for Prompt Optimization D Prompt Optimization Curves on the Remaining Bbh Tasks D Prompt Optimization Curves on the Remaining Bbh Tasks E Prompt Optimization on Bbh Tasks – Tabulated Accuracies and Found Instructions E Prompt Optimization on Bbh Tasks – Tabulated Accuracies and Found Instructions 4 APPLICATION: PROMPT OPTIMIZATION Next, we demonstrate the effectiveness of OPRO on prompt optimization, where the objective is to find the prompt that maximizes task accuracy. We first introduce the problem setup, then illustrate the meta-prompt design. 4.1 PROBLEM SETUP We focus on prompt optimization for natural language tasks, where both the input and output are in the text format. The task is represented as a dataset with training and test splits, where the training set is used to calculate the training accuracy as the objective value during the optimization process, and we compute the test accuracy on the test set after the optimization finishes. While traditional optimization often requires a decently large training set, our experiment shows that a small number or fraction of training samples (e.g., 3.5% of the training set for GSM8K (Cobbe et al., 2021), 20% for Big-Bench Hard (Suzgun et al., 2022)) is sufficient. The objective function evaluator is an LLM to which the optimized prompt will be applied, and it can be the same or different from the LLM for optimization. We denote the LLM for objective function evaluation as the scorer LLM, and the LLM for optimization as the optimizer LLM. The output of the optimizer LLM is an instruction, which is concatenated to the question part of every exemplar and prompts the scorer LLM. We consider the following positions to insert the instruction: • Q_begin: the instruction is added before the original question. • Q_end: the instruction is added after the original question. • A_begin: the instruction is added to the beginning of the scorer LLM output. This is applicable to pretrained LLMs without instruction tuning, where the prompt is formatted as a sequence of QA pairs. We exemplify these prompting formats in Appendix B. This paper is available on arxiv under CC0 1.0 DEED license. This paper is available on arxiv under CC0 1.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

How OPRO Improves Task Accuracy in Prompt Optimization

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps