LAMBDA: A Large Model Based Data Agent

Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, Jian Huang

2024-07-26

Summary

This paper introduces LAMBDA, a new open-source system that allows users to analyze data without needing to write any code. It uses advanced AI models to help users add and manage data more effectively by using natural language instructions.

What's the problem?

Many people who work with data, like scientists or business analysts, often struggle with coding skills that are necessary for data analysis. This makes it hard for them to use powerful AI tools effectively. Additionally, existing systems often require complicated setups or manual adjustments, which can be time-consuming and frustrating.

What's the solution?

LAMBDA solves these problems by providing a code-free environment where users can interact with data agents using simple language. It has two main roles: the programmer, who writes the necessary code based on user instructions, and the inspector, who checks for errors in that code. This collaboration allows users to perform complex data tasks easily. LAMBDA also includes a user-friendly interface and can integrate other models and algorithms for customized analysis.

Why it matters?

This system is important because it makes data analysis accessible to more people, including those who may not have technical backgrounds. By bridging the gap between human expertise and AI capabilities, LAMBDA encourages innovation and helps users from various fields make better decisions based on their data.

Abstract

We introduce ``LAMBDA," a novel open-source, code-free multi-agent data analysis system that that harnesses the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through the use of innovatively designed data agents that operate iteratively and generatively using natural language. At the core of LAMBDA are two key agent roles: the programmer and the inspector, which are engineered to work together seamlessly. Specifically, the programmer generates code based on the user's instructions and domain-specific knowledge, enhanced by advanced models. Meanwhile, the inspector debugs the code when necessary. To ensure robustness and handle adverse scenarios, LAMBDA features a user interface that allows direct user intervention in the operational loop. Additionally, LAMBDA can flexibly integrate external models and algorithms through our knowledge integration mechanism, catering to the needs of customized data analysis. LAMBDA has demonstrated strong performance on various machine learning datasets. It has the potential to enhance data science practice and analysis paradigm by seamlessly integrating human and artificial intelligence, making it more accessible, effective, and efficient for individuals from diverse backgrounds. The strong performance of LAMBDA in solving data science problems is demonstrated in several case studies, which are presented at https://www.polyu.edu.hk/ama/cmfai/lambda.html.

View Paper