B Details of Think-and-Execute
C Prompts Used in Our Experiments
D Human-written Pseudocode Prompts
F Generated Pseudocode Prompts
In this section, we introduce THINK-AND-EXECUTE and provide a detailed explanation of how LLMs perform reasoning with it. We incorporate an Instructor LM I and a Reasoner LM R, for THINK and EXECUTE, respectively. Figure 2 shows the overview of our framework.
The goal for the Instructor LM I in this phase is to discover the underlying logic for solving a given task t, and generate a prompt describing the logic, which will be further applied to all instances of the task (in EXECUTE). This prompt is constructed with pseudocode rather than natural language, which is used in prior work to guide the LM to perform step-by-step reasoning (Kojima et al., 2022; Wang et al., 2023).
Step 1: Constructing a meta prompt. To prompt the Instructor LM I to generate a tasklevel pseudocode for the given target task t, we provide P of other tasks as demonstrations in a meta prompt.[1] In practice, we construct the meta prompt with 3 randomly sampled tasks (3 example questions, analysis, and P for each task) from T as demonstrations and the target task t (3 example questions without the answers).[2]
Step 2: Analyzing the target task. Given the meta prompt, I generates an analysis containing key reasoning logic that is required to solve the target task regardless of the instances (questions). For example, in Figure 2 (Top), the generated analysis points out that building a truthfulness map and updating it by processing statements are needed to solve the task, i.e., Web of Lies. This step guides I to focus on the reasoning process shared among all the instances, which would be crucial in making a task-level prompt.
Step 3: Generating a pseudocode prompt based on the analysis. Next, based on the analysis, I writes a prompt P in the form of pseudocode, which breaks down the necessary reasoning steps for solving the target task. We choose to use the pseudocode format over the form of natural language plan (Kojima et al., 2022; Wang et al., 2023) for two main reasons: (1) the efficiency of it in describing the logic behind a task (e.g., avoid using repetitive instructions via for loop), and (2) the guidance of what and when to generate rationales via the argument in print() statement and the location within the execution of code. For example, in Figure 2, the P contains the statement, print(f”{person1} says {person2} {action}. {person1} tells the truth: {truth dict[person1]}”), which instructs the Reasoner LM to generate a rationale that is helpful in keep tracking of the truth map containing the truthfulness of each person, during the execution of P. We provide more examples of meta prompt, analysis, and pseudocode prompt in Appendix G.
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.
[1] We manually annotate P for each task in T in advance. See Appendix B.1 for examples.
[2] We use the questions of the examples instances in the few-shot prompt in Big-Bench Hard.
Authors:
(1) Hyungjoo Chae, Yonsei University;
(2) Yeonghyeon Kim, Yonsei University;
(3) Seungone Kim, KAIST AI;
(4) Kai Tzu-iunn Ong, Yonsei University;
(5) Beong-woo Kwak, Yonsei University;
(6) Moohyeon Kim, Yonsei University;
(7) Seonghwan Kim, Yonsei University;
(8) Taeyoon Kwon, Yonsei University;
(9) Jiwan Chung, Yonsei University;
(10) Youngjae Yu, Yonsei University;
(11) Jinyoung Yeo, Yonsei University.