Researchers Are Teaching AI to Plan, Think, and Build Its Own Tools—for Data Science

Authors:

(1) Sirui Hong, DeepWisdom and these authors contributed equally to this work;

(2) Yizhang Lin, DeepWisdom and these authors contributed equally to this work;

(3) Bang Liu, Universite de Montreal & Mila and these author are listed in alphabetical order;

(4) Bangbang Liu, DeepWisdom and these authors contributed equally to this work;

(5) Binhao Wu, DeepWisdom and these authors contributed equally to this work;

(6) Danyang Li, DeepWisdom and these authors contributed equally to this work;

(7) Jiaqi Chen, Fudan University and these authors contributed equally to this work;

(8) Jiayi Zhang, Renmin University of China and these authors contributed equally to this work;

(9) Jinlin Wang, DeepWisdom and these authors contributed equally to this work;

(10) Li Zhang, Fudan University and these authors contributed equally to this work;

(11) Lingyao Zhang, these authors contributed equally to this work;

(12) Min Yang, 5Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences and these authors contributed equally to this work;

(13) Mingchen Zhuge, AI Initiative, King Abdullah University of Science and Technology and these authors contributed equally to this work;

(14) Taicheng Guo, University of Notre Dame and these authors contributed equally to this work;

(15) Tuo Zhou, The University of Hong Kong and these authors contributed equally to this work;

(16) Wei Tao, Fudan University and these authors contributed equally to this work;

(17) Wenyi Wang, AI Initiative, King Abdullah University of Science and Technology and these authors contributed equally to this work;

(18) Xiangru Tang, Yale University and these authors contributed equally to this work;

(19) Xiangtao Lu, DeepWisdom and these authors contributed equally to this work;

(20) Xiawu Zheng, Xiamen University and these authors contributed equally to this work;

(21) Xinbing Liang, DeepWisdom, East China Normal University and these authors contributed equally to this work;

(22) Yaying Fei, Beijing University of Technology and these authors contributed equally to this work;

(23) Yuheng Cheng, The Chinese University of Hong Kong, Shenzhen and these authors contributed equally to this work;

(24) Zongze Xu, DeepWisdom, Hohai University and these authors contributed equally to this work;

(25) Chenglin Wu, DeepWisdom and a corresponding author.

Editor's Note: This is Part 2 of 5 of a research study detailing the development of Data Interpreter, a solution for various data science and real-world tasks. Read the rest below.

Table of Links

Abstract and 1 Introduction
2 Related Work
3 Methodology and 3.1 Dynamic planning with Hierarchical Structure
- 3.2 Tool utilization and generation
- 3.3 Enhancing reasoning with verification and experience
4 Experiments
- 4.1 Experimental Setup
- 4.2 Main Result
- 4.3 Ablation Study
5 Conclusion and References

A. Additional Results

B. Implementation Results

C. Details of Datasets

LLMs as data scientist agents

Cutting-edge Large Language Models (LLMs), pre-trained on diverse natural and programming data, exhibit strong interpretation abilities. Like, (Gao et al., 2023) (Chen et al., 2022) leverages program interpreters to decouple complex computation, (Zhou et al., 2023a) boost their performance on the MATH dataset, and (Hendrycks et al., 2021), (Li et al., 2023), (Liang et al., 2023) enable code-based reasoning capabilities in embodied agents. CodeAct (Wang et al., 2024) executes and dynamically revises code actions through multi-turn interactions with a Python interpreter. Building on code interpretation capabilities, researchers are exploring ways to leverage LLMs to address data science challenges(Bordt et al., 2024; Chen et al., 2024b; Yang et al., 2024; Hassan et al., 2023; Sanger et al., 2023) and integrate LLMs with specialized machine learning ¨ pipelines. For instance, (Huang et al., 2023) develops or enhances Machine Learning models from data and task descriptions autonomously. In addition, (Romera-Paredes et al., 2023) pairs LLMs with systematic evaluation to discover solutions to open problems by evolving executable programs describing solution methods. However, there is a lack of datasets and evaluation methods designed to assess the abilities of LLM-based methods in this field. We benchmark our work and various opensource frameworks on machine learning problem-solving to provide more insight and understanding of this research area.

Planning

Planning is the critical capability of LLM-based agents. Planning capability emphasizes the generation of logically structured actions or thoughts roadmap for specific problems (Huang et al., 2024; Chen et al., 2024a). For the planning capability of LLM-based agents, earlier work such as CoT (Wei et al., 2022) and ReAct (Yao et al., 2022) focus on the decomposition of complicated tasks and perform sequential planning for subtasks. Due to the complexity of tasks, one single plan generated by the LLM-based agent is sometimes infeasible. Hence, some kinds of work, such as ToT (Yao et al., 2024) and GoT (Besta et al., 2023), are designed to generate multiple plans and select one plan to execute. Although these previous planning approaches demonstrate impressive performance, they struggle to address multi-step problems with strong task dependencies, a common occurrence in data science tasks. Alternatively, we utilize dynamic hierarchical planning to enhance the capability, allowing for the decomposition of complex problems into task and action graphs, commonly encountered in data science scenarios.

Tools

Recent research has focused on improving the capabilities of LLMs by creating and integrating external tools (Schick et al., 2024; Paranjape et al., 2023). (Zhuge et al., 2023; Shen et al., 2024) propose multiple agents to solve multimodal tasks. (Yuan et al., 2023) introduce a general tool creation and retrieval framework for LLMs with a plug-and-play approach. (Liu et al., 2023) proposed an automatic tool selection mechanism based on LLM decision-making rather than statically assigning specific tools for certain tasks. In the field of self-creation of tools, (Cai et al., 2023) transformed the role of LLM from a tool user to a creator, achieving self-sufficiency in tool creation. (Qian et al., 2023) presented a framework that combines the creation and use of tools to solve problems. In this paper, we have expanded the types and range of tools usage. We not only implemented the two types of tools proposed in their future work, named “Upgrade of Existing Tool” and “Combination of Multiple Tools”, but also improved tool generation efficiency and practicality. We achieve this by leveraging execution experience instead of relying on Few-Shot Prompts. Furthermore, this study supports creating various private tool libraries and allows LLMs to independently select and combine multiple tools as needed.

Reasoning

Reasoning capability emphasizes understanding and processing of information to make decisions (Huang et al., 2024), which is another key strength of LLM-based agents. For the reasoning capability, previous works such as Reflexion (Shinn et al., 2024), Self-Refine (Madaan et al., 2024), CRITIC (Gou et al., 2023) focus on encouraging LLM-based agents to reflect on failures and refine the reasoning process. Moreover, (Gao et al., 2023) is pioneering work in leveraging code to improve the accuracy of LLM to solve mathematical, symbolic, and algorithmic reasoning problems. (Chen et al., 2022) decouples complex computation from language understanding and reasoning by using a program interpreter to solve numeric reasoning tasks. (Wang et al., 2023) leverages an iterative prompting mechanism to enhance programs used as actions by agents in Minecraft based on feedback from the environment and self-verification. Unlike prior approaches that primarily focused on general language feedback or execution feedback, our work tackles the unique challenges posed by data science problems that require advanced logical reasoning. Specifically, we propose novel automated confidence-based verification mechanisms to improve reasoning capability.

This paper is available on arxiv under CC BY 4.0 DEED license.

Researchers Are Teaching AI to Plan, Think, and Build Its Own Tools—for Data Science

Too Long; Didn't Read

Table of Links

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps

Researchers Are Teaching AI to Plan, Think, and Build Its Own Tools—for Data Science

Too Long; Didn't Read

Table of Links

2 RELATED WORK

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps