AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models

Snehasis Mukhopadhyay, Aryan Kasat, Shivam Dubey, Rahul Karthikeyan, Dhruv Sood, Vinija Jain, Aman Chadha, Amitava Das

2025-09-03

AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models

Summary

This paper addresses the problem of harmful biases in large language models, specifically focusing on how these models can reflect and perpetuate societal prejudices related to caste and religion in India. The researchers introduce a new framework called AMBEDKAR to make these models fairer and more inclusive.

What's the problem?

Large language models learn from massive amounts of text data, and unfortunately, that data often contains existing societal biases. When these models are used, they can unintentionally produce outputs that are prejudiced or discriminatory, especially concerning sensitive topics like caste and religion. Current methods for reducing bias are mostly developed with Western societies in mind and don't effectively address the specific nuances of bias present in the Indian context.

What's the solution?

The researchers developed AMBEDKAR, a system inspired by the principles of equality championed by Dr. B. R. Ambedkar, a key figure in India's constitution. It works by adding a special 'decoding layer' to the model *when it's generating text*, not by changing the model itself. This layer uses principles from the Indian Constitution to guide the model towards fairer and more neutral outputs. They also cleverly use a technique called 'speculative decoding' – normally used to speed things up – to proactively identify and reduce biased language during text generation. Essentially, a smaller, potentially biased model proposes text, and a larger, constitutionally-guided model checks and corrects it for fairness.

Why it matters?

This work is important because it tackles a critical issue in AI: ensuring fairness and preventing the spread of harmful biases. By focusing on the Indian context and developing a solution that doesn't require expensive retraining of models, it offers a practical and culturally sensitive approach to building more responsible AI systems. Reducing bias in these models is crucial for preventing discrimination and promoting inclusivity in a society where caste and religion are sensitive issues.

Abstract

Large Language Models (LLMs) can inadvertently reflect societal biases present in their training data, leading to harmful or prejudiced outputs. In the Indian context, our empirical evaluations across a suite of models reveal that biases around caste and religion are particularly salient. Yet, most existing mitigation strategies are Western-centric and fail to address these local nuances. We propose AMBEDKAR, a framework inspired by the egalitarian vision of Dr B. R. Ambedkar, architect of the Indian Constitution, to guide LLM outputs toward fairness, neutrality, and inclusion in line with Articles 14 to 17. Our approach introduces a Constitution-Aware Decoding Layer, guided by the AI Constitution of India and applied only at inference time, without any parameter updates to the base model. We incorporate a speculative decoding algorithm that proactively reduces casteist and communal bias during generation. This mitigation layer operates directly within the decoding process, avoiding changes to model internals and lowering the computational and infrastructural costs associated with retraining. We reinterpret speculative decoding not merely as an efficiency tool but as a mechanism for fairness. In this framework, a Small Language Model (SLM) acts as a potentially biased generator, while a constitutionally guided Large Language Model (LLM) serves as the verifier. Rather than accelerating generation, the LLM enforces bias-robust trajectories in the SLM outputs. This inversion of roles gives rise to a fairness-by-speculation paradigm. Our approach yields an absolute reduction of bias up to 26.41 percent compared to baseline. Our source code, datasets, and results are available at https://anonymous.4open.science/r/AMBEDKAR-983B/

View Paper