AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda

Mohd Nauman, Sravan Gvm, Vijay Devane, Shyam Pawar, Viraj Thakur, Kundeshwar Pundalik, Piyush Sawarkar, Rohit Saluja, Maunendra Desarkar, Ganesh Ramakrishnan

2025-11-05

AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda

Summary

This paper introduces AyurParam-2.9B, a new language model specifically designed to understand and work with the complex system of Ayurvedic medicine, which is a traditional Indian medical practice.

What's the problem?

Current large language models, while good at many things, struggle when dealing with specialized fields like Ayurveda because they lack the deep understanding of the culture, language, and specific medical knowledge needed to accurately interpret its texts and apply its principles. They aren't trained on enough information about Ayurveda to be reliable.

What's the solution?

The researchers created AyurParam-2.9B by taking an existing language model and then training it further using a large collection of Ayurvedic texts and clinical information, in both English and Hindi. This training data wasn't just raw text; it included questions and answers designed to test reasoning and factual accuracy, and it was carefully checked by experts to ensure it was correct and clear. This process is called 'fine-tuning'.

Why it matters?

This work shows that simply making a language model bigger isn't enough. To get reliable AI in specialized areas like medicine, you need to specifically train it on high-quality data from that field, and make sure the training process considers the cultural context. This is important for building AI that can be trusted to provide accurate and culturally sensitive medical advice in areas like Ayurveda.

Abstract

Current large language models excel at broad, general-purpose tasks, but consistently underperform when exposed to highly specialized domains that require deep cultural, linguistic, and subject-matter expertise. In particular, traditional medical systems such as Ayurveda embody centuries of nuanced textual and clinical knowledge that mainstream LLMs fail to accurately interpret or apply. We introduce AyurParam-2.9B, a domain-specialized, bilingual language model fine-tuned from Param-1-2.9B using an extensive, expertly curated Ayurveda dataset spanning classical texts and clinical guidance. AyurParam's dataset incorporates context-aware, reasoning, and objective-style Q&A in both English and Hindi, with rigorous annotation protocols for factual precision and instructional clarity. Benchmarked on BhashaBench-Ayur, AyurParam not only surpasses all open-source instruction-tuned models in its size class (1.5--3B parameters), but also demonstrates competitive or superior performance compared to much larger models. The results from AyurParam highlight the necessity for authentic domain adaptation and high-quality supervision in delivering reliable, culturally congruent AI for specialized medical knowledge.

View Paper