LLPut: Investigating Large Language Models for Bug Report-Based Input Generation

Alif Al Hasan, Subarna Saha, Mia Mohammad Imran, Tarannum Shaila Zaman

2025-03-28

LLPut: Investigating Large Language Models for Bug Report-Based Input
Generation

Summary

This paper explores how well AI can automatically find the specific inputs that cause software bugs, based on the descriptions of those bugs.

What's the problem?

Finding the exact input that triggers a bug is important for fixing it, but it can be time-consuming to do manually.

What's the solution?

The researchers tested how well different AI models can extract these inputs from bug reports written in plain English.

Why it matters?

This work matters because it could lead to faster and more efficient bug fixing, making software more reliable.

Abstract

Failure-inducing inputs play a crucial role in diagnosing and analyzing software bugs. Bug reports typically contain these inputs, which developers extract to facilitate debugging. Since bug reports are written in natural language, prior research has leveraged various Natural Language Processing (NLP) techniques for automated input extraction. With the advent of Large Language Models (LLMs), an important research question arises: how effectively can generative LLMs extract failure-inducing inputs from bug reports? In this paper, we propose LLPut, a technique to empirically evaluate the performance of three open-source generative LLMs -- LLaMA, Qwen, and Qwen-Coder -- in extracting relevant inputs from bug reports. We conduct an experimental evaluation on a dataset of 206 bug reports to assess the accuracy and effectiveness of these models. Our findings provide insights into the capabilities and limitations of generative LLMs in automated bug diagnosis.

View Paper