MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

Haoyu Dong, Pengkun Zhang, Mingzhe Lu, Yanzhen Shen, Guolin Ke

2025-09-12

MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

Summary

This paper introduces a new way to improve how large language models, or LLMs, learn from examples provided directly within the prompt, without needing to adjust the model's internal settings through traditional training. It focuses on making LLMs better at machine learning tasks just by showing them how to do it, rather than teaching them.

What's the problem?

LLMs are really good at understanding language and general reasoning, but they often struggle when you try to get them to learn a specific task by simply giving them a lot of examples within the prompt. Standard machine learning relies on showing examples, but LLMs don't always effectively use a large number of these 'in-context' examples to improve their performance on tasks like classifying data or making predictions. They need a boost to really leverage this method.

What's the solution?

The researchers created a system called MachineLearningLM. They essentially continued training the LLM, but instead of using regular text, they fed it millions of simulated machine learning problems generated from something called structural causal models. They started by having the LLM learn from a simpler 'teacher' model based on decision trees, which helped it get better at numerical tasks. They also designed a clever way to present these problems to the LLM so it could process more examples at once, making the learning process much faster. They used a relatively small LLM (Qwen-2.5-7B-Instruct) and a technique called LoRA to make the process efficient.

Why it matters?

This work is important because it shows how to make LLMs much more capable of performing machine learning tasks simply by providing examples. This means you don't need to retrain the entire model for each new task, which saves a lot of time and resources. The improved performance, especially with many examples, and the preservation of the LLM’s general abilities make it a significant step towards more versatile and powerful AI systems that can adapt to new challenges quickly and efficiently.

Abstract

Large language models (LLMs) possess broad world knowledge and strong general-purpose reasoning ability, yet they struggle to learn from many in-context examples on standard machine learning (ML) tasks, that is, to leverage many-shot demonstrations purely via in-context learning (ICL) without gradient descent. We introduce MachineLearningLM, a portable continued-pretraining framework that equips a general-purpose LLM with robust in-context ML capability while preserving its general knowledge and reasoning for broader chat workflows. Our pretraining procedure synthesizes ML tasks from millions of structural causal models (SCMs), spanning shot counts up to 1,024. We begin with a random-forest teacher, distilling tree-based decision strategies into the LLM to strengthen robustness in numerical modeling. All tasks are serialized with a token-efficient prompt, enabling 3x to 6x more examples per context window and delivering up to 50x amortized throughput via batch inference. Despite a modest setup (Qwen-2.5-7B-Instruct with LoRA rank 8), MachineLearningLM outperforms strong LLM baselines (e.g., GPT-5-mini) by an average of about 15% on out-of-distribution tabular classification across finance, physics, biology, and healthcare domains. It exhibits a striking many-shot scaling law: accuracy increases monotonically as in-context demonstrations grow from 8 to 1,024. Without any task-specific training, it attains random-forest-level accuracy across hundreds of shots. General chat capabilities, including knowledge and reasoning, are preserved: it achieves 75.4% on MMLU.

View Paper