Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs
Chen Zheng, Ke Sun, Xun Zhou
2024-06-14
Summary
This paper introduces Mistral-C2F, a new model designed to improve the conversational and analytical abilities of smaller language models. It uses a two-step approach called Coarse-to-Fine Actor to help these models generate more coherent and detailed dialogues.
What's the problem?
While larger language models like GPT-4 are very effective, smaller models often struggle to create in-depth and coherent conversations. They lack the ability to analyze information deeply and respond in a way that aligns with human preferences. This limitation makes them less useful for tasks that require detailed reasoning and conversation.
What's the solution?
To tackle this issue, the authors developed the Mistral-C2F model, which consists of two main parts: the Coarse Actor and the Fine Actor. The Coarse Actor uses a technique called 'Continuous Maximization' to generate a broad range of responses that are rich in knowledge. However, this initial output can sometimes be too long or repetitive. The Fine Actor then takes this output and refines it by merging it with an existing instruction model, improving the quality and reducing unnecessary repetition. This two-step process allows the model to produce more accurate and engaging conversations.
Why it matters?
This research is significant because it enhances the capabilities of smaller language models, making them more effective for real-world applications like customer service or personal assistants. By improving how these models understand and generate dialogue, Mistral-C2F can lead to better user experiences and more intelligent AI systems.
Abstract
Despite the advances in Large Language Models (LLMs), exemplified by models like GPT-4 and Claude, smaller-scale LLMs such as Llama and Mistral often struggle with generating in-depth and coherent dialogues. This paper presents a novel two-step Coarse-to-Fine Actor model to address the inherent limitations in conversational and analytical capabilities of small-sized LLMs. Our approach begins with the Policy-based Coarse Actor, employing a technique we term "Continuous Maximization". The Coarse Actor establishes an enhanced, knowledge-rich pool adept at aligning with human preference styles in analysis and reasoning. Through the RLHF process, it employs Continuous Maximization, a strategy that dynamically and adaptively extends the output length limit, enabling the generation of more detailed and analytical content. Subsequently, the Fine Actor refines this analytical content, addressing the generation of excessively redundant information from the Coarse Actor. We introduce a "Knowledge Residue Merger" approach, refining the content from the Coarse Actor and merging it with an existing Instruction model to improve quality, correctness, and reduce redundancies. We applied our methodology to the popular Mistral model, creating Mistral-C2F, which has demonstrated exceptional performance across 11 general language tasks and the MT-Bench Dialogue task, outperforming similar-scale models and even larger models with 13B and 30B parameters. Our model has significantly improved conversational and analytical reasoning abilities.