Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance

Ahmed Alajrami, Xingwei Tan, Nikolaos Aletras

2025-10-07

Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance

Summary

This paper investigates whether training large language models with slightly altered instructions can make them better at understanding what people *mean* even if the instructions aren't perfectly worded.

What's the problem?

Large language models are really good at following instructions, but they can be thrown off by small changes in how those instructions are written. Imagine asking for a summary in slightly different words each time – the model might give different answers! This makes them unreliable when users don't phrase things perfectly, which happens a lot in real life.

What's the solution?

The researchers trained language models using instructions that had been intentionally changed in small ways, like removing common words or mixing up the order of words. Then, they tested if these models were better at handling both the original, clear instructions *and* the slightly messed-up ones. They used standard tests like MMLU, BBH, and GSM8K to see how well the models performed.

Why it matters?

The findings suggest that training models with these altered instructions can actually improve their performance, making them more robust and user-friendly. This is important because it means we can build language models that are less sensitive to how we ask questions, leading to more consistent and helpful responses even with imperfect user input.

Abstract

Instruction-tuning plays a vital role in enhancing the task-solving abilities of large language models (LLMs), improving their usability in generating helpful responses on various tasks. However, previous work has demonstrated that they are sensitive to minor variations in instruction phrasing. In this paper, we explore whether introducing perturbations in instruction-tuning data can enhance LLMs' resistance against noisy instructions. We focus on how instruction-tuning with perturbations, such as removing stop words or shuffling words, affects LLMs' performance on the original and perturbed versions of widely-used benchmarks (MMLU, BBH, GSM8K). We further assess learning dynamics and potential shifts in model behavior. Surprisingly, our results suggest that instruction-tuning on perturbed instructions can, in some cases, improve downstream performance. These findings highlight the importance of including perturbed instructions in instruction-tuning, which can make LLMs more resilient to noisy user inputs.

View Paper