Can Large Language Models Reinvent Foundational Algorithms?

Jian Zhao, Haoren Luo, Yu Wang, Yuhan Cao, Pingyue Sheng, Tianxing He

2026-04-20

Can Large Language Models Reinvent Foundational Algorithms?

Summary

This paper investigates whether large language models (LLMs) can actually come up with completely new ideas, specifically if they can rediscover fundamental computer science algorithms on their own.

What's the problem?

While LLMs are good at using information they've already been trained on, it's unclear if they can truly *innovate* – meaning, can they figure things out from scratch like humans do? The researchers wanted to see if LLMs could essentially 'reinvent the wheel' by rediscovering well-known algorithms without relying on their existing knowledge of them.

What's the solution?

The researchers developed a process called 'Unlearn-and-Reinvent'. First, they used a technique to remove specific algorithms, like Dijkstra's algorithm for finding the shortest path, from the LLM's memory. Then, they gave the LLM a problem that the algorithm solves and observed if the LLM could independently come up with a similar solution. They tested this on ten different algorithms, using three different LLMs and providing varying levels of hints to help the models along. They also used a method called 'test-time reinforcement learning' to help the model learn as it tried to solve the problem.

Why it matters?

This research helps us understand the limits of LLMs. If they can't even rediscover basic algorithms, it suggests their ability to truly innovate is limited. However, the fact that the best LLM *did* successfully reinvent some algorithms, especially with a few hints, shows they have some potential for creative problem-solving. Understanding this potential and its limitations is crucial as we rely more on LLMs in scientific fields.

Abstract

LLMs have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundational innovation, however, remains an open question. In this work, we focus on a prerequisite for foundational innovation: can LLMs reinvent foundational algorithms in computer science? Our Unlearn-and-Reinvent pipeline applies LLM unlearning to remove a specific foundational algorithm, such as Dijkstra's or Euclid's algorithm, from an LLM's pretrained knowledge, and then tests whether the model can reinvent it in a controlled environment. To enable effective unlearning, we adopt a GRPO-based, on-policy unlearning method. Across 10 target algorithms, 3 strong open-weight models, and 3 hint levels, our experiments demonstrate that (1) the strongest model Qwen3-4B-Thinking-2507 successfully reinvents 50% of the algorithms with no hint, 70% at hint level 1, and 90% at hint level 2; (2) a few high-level hints can enhance the reinvention success rate, but even step-by-step hints fail for those complicated algorithms; and (3) test-time reinforcement learning enables successful reinvention for the Strassen algorithm at hint level 2. Through analyses of output trajectories and ablation studies, we find that generative verifier in the reinvention phase plays a critical role in sustaining models' reasoning strength, helping to avoid the ``thought collapse'' phenomenon. These findings offer insights into both the potential and current limits of LLMs' innovative thinking.

View Paper