GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab
2025-07-28
Summary
This paper talks about GEPA, a new method that improves how AI models learn by making the model reflect on its own performance in natural language, instead of just relying on trial and error rewards like traditional reinforcement learning.
What's the problem?
Traditional reinforcement learning methods need a lot of attempts to learn, which takes a lot of time and computing power, and they only get feedback from simple scores instead of understanding detailed information about what went right or wrong.
What's the solution?
The researchers created GEPA, which uses language itself as feedback, allowing the AI to analyze its own reasoning, errors, and tool use in detail. GEPA uses an evolutionary process to improve prompts by combining ideas that work well and trying new variations, making learning more efficient.
Why it matters?
This matters because GEPA helps AI systems learn faster and better with much less trial and error, making it cheaper and more practical to improve complex AI tasks in real-world applications.
Abstract
GEPA, a prompt optimizer using natural language reflection, outperforms RL methods like GRPO and MIPROv2 with fewer rollouts by learning high-level rules from trial and error.