NILE: Internal Consistency Alignment in Large Language Models

Minda Hu, Qiyuan Zhang, Yufei Wang, Bowei He, Hongru Wang, Jingyan Zhou, Liangyou Li, Yasheng Wang, Chen Ma, Irwin King

2024-12-24

NILE: Internal Consistency Alignment in Large Language Models

Summary

This paper talks about NILE, a new framework designed to improve the training of large language models (LLMs) by ensuring that the data used for instruction fine-tuning (IFT) aligns with the internal knowledge these models have already learned.

What's the problem?

Large language models often struggle to perform well when the training data they receive is inconsistent with what they already know. This inconsistency can lead to poor performance during instruction fine-tuning, which is a process that helps these models learn how to follow human instructions effectively. Without high-quality, consistent datasets, the models may not learn as well as they could.

What's the solution?

To solve this problem, the authors introduced NILE (iNternal consIstency aLignmEnt), which focuses on optimizing IFT datasets. NILE works by extracting the internal knowledge from pre-trained LLMs and using that knowledge to revise the training data. They also developed a method called Internal Consistency Filtering (ICF) to ensure that only high-quality samples that match the model's internal knowledge are included in training. Their experiments showed that using NILE significantly improved the performance of LLMs on various tasks.

Why it matters?

This research is important because it highlights how crucial it is for training data to align with what LLMs already know in order to maximize their potential. By improving the quality of IFT datasets, NILE can help make AI systems more effective and reliable, which is essential as these models are used in more applications across different fields.

Abstract

As a crucial step to enhance LLMs alignment with human intentions, Instruction Fine-Tuning (IFT) has a high demand on dataset quality. However, existing IFT datasets often contain knowledge that is inconsistent with LLMs' internal knowledge learned from the pre-training phase, which can greatly affect the efficacy of IFT. To address this issue, we introduce NILE (iNternal consIstency aLignmEnt) framework, aimed at optimizing IFT datasets to unlock LLMs' capability further. NILE operates by eliciting target pre-trained LLM's internal knowledge corresponding to instruction data. The internal knowledge is leveraged to revise the answer in IFT datasets. Additionally, we propose a novel Internal Consistency Filtering (ICF) method to filter training samples, ensuring its high consistency with LLM's internal knowledge. Our experiments demonstrate that NILE-aligned IFT datasets sharply boost LLM performance across multiple LLM ability evaluation datasets, achieving up to 66.6% gain on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE}framework contributes to these substantial performance improvements, and provides compelling evidence that dataset consistency with pre-trained internal knowledge is pivotal for maximizing LLM potential.

View Paper