Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models
Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen
2024-11-05

Summary
This paper introduces Multi-expert Prompting, a new method that improves how large language models (LLMs) generate responses by simulating multiple experts. It aggregates their answers to provide more accurate and useful information.
What's the problem?
Many existing LLMs can give biased or incomplete answers, especially for complex questions. Traditional prompting methods often rely on a single response, which can limit the quality and reliability of the information provided. This can lead to issues such as misinformation or harmful content.
What's the solution?
Multi-expert Prompting enhances LLMs by allowing them to simulate several expert opinions on a given topic. It does this through a structured process that involves seven subtasks based on a decision-making technique called the Nominal Group Technique. By gathering responses from these simulated experts and selecting the best one, the model can produce answers that are more truthful, informative, and less toxic. The authors tested this method and found that it significantly outperformed previous approaches in terms of accuracy and safety.
Why it matters?
This research is important because it makes AI systems more reliable and safer for users. By effectively simulating diverse expert insights, Multi-expert Prompting can improve applications like virtual assistants, educational tools, and customer support systems. This advancement helps ensure that AI provides high-quality information while minimizing harmful or misleading content.
Abstract
We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. This process is performed in a single chain of thoughts through our seven carefully designed subtasks derived from the Nominal Group Technique (Ven and Delbecq, 1974), a well-established decision-making framework. Our evaluations demonstrate that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness. It further achieves state-of-the-art truthfulness by outperforming the best baseline by 8.69% with ChatGPT. Multi-expert Prompting is efficient, explainable, and highly adaptable to diverse scenarios, eliminating the need for manual prompt construction.