B Details of Think-and-Execute
C Prompts Used in Our Experiments
D Human-written Pseudocode Prompts
F Generated Pseudocode Prompts
Chain-of-Thought prompting. Chain-of-thought (CoT) prompting evokes LMs to generate intermediate reasoning steps that guide and explain the solution (Wei et al., 2022; Wang et al., 2022; Wu et al., 2023). One common paradigm of this is zero-shot CoT prompting (Kojima et al., 2022). Without specifically designed question-explanation-answer triplets as demonstrations, zero-shot CoT prompting elicits a plausible reasoning path towards the final answer with simple instruction, such as ”Let’s think step-by-step”, eliciting better model performance in tasks that require multi-step reasoning.
In the context of improving zero-shot CoT, Wang et al. (2023) propose to first generate a plan breaking down the target task into smaller subtasks, and then solve each subtask according to the plan. Similar to our approach, a concurrent work (Zhou et al., 2024) devises a task-level reasoning structure that can be applied to each instance (question) of the target task. The most significant distinction between these prior studies and ours is that our THINK-AND-EXECUTE adopts pseudocode (as opposed to natural language) to express the necessary logic for solving the task. We demonstrate that our task-level pseudocode prompt empowers LMs with better ability of zero-shot reasoning than natural language plans under various settings in Section 5.
Incorporation of code in reasoning. With unambiguous syntaxe and strict structure, programming languages such as Python have been applied to LLM-based systems to improve system performance in solving tasks. For instance, Gao et al. (2023) and (Chen et al., 2023) use LLMs to generate Python code for given mathematical questions, and run the generated code on external compilers to obtain/calculate the answers. Concurrently with our work, Li et al. (2023) present chain-of-code (CoC), where pseudocode is also incorporated along with the Python code for solving a given question (instance). While this approach generates instance-specific code as intermediate reasoning steps for each individual instance, our THINK-AND-EXECUTE, by contrast, focus on the task-level pseudocode prompt that can be applied to all instances. We compare CoC and THINK-AND-EXECUTE in Section 4.
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.
Authors:
(1) Hyungjoo Chae, Yonsei University;
(2) Yeonghyeon Kim, Yonsei University;
(3) Seungone Kim, KAIST AI;
(4) Kai Tzu-iunn Ong, Yonsei University;
(5) Beong-woo Kwak, Yonsei University;
(6) Moohyeon Kim, Yonsei University;
(7) Seonghwan Kim, Yonsei University;
(8) Taeyoon Kwon, Yonsei University;
(9) Jiwan Chung, Yonsei University;
(10) Youngjae Yu, Yonsei University;
(11) Jinyoung Yeo, Yonsei University.