Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Asim Mohamed, Martin Gubri

2025-10-22

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Summary

This paper investigates the effectiveness of current methods designed to secretly mark text generated by large language models (LLMs) so it can be traced back to its source, specifically when that text is translated into different languages.

What's the problem?

Existing watermarking techniques for LLMs claim to work across many languages, but they haven't been thoroughly tested on languages that aren't widely used like English or Spanish. The researchers found that these methods become unreliable when the generated text is translated into medium- or low-resource languages – languages with less available data for things like translation and language modeling. This happens because the way the text is broken down into smaller pieces (tokens) doesn't work as well for these languages, leading to the watermark being lost or becoming undetectable.

What's the solution?

To fix this, the researchers developed a new method called STEAM. STEAM uses back-translation – translating the text *back* into the original language – to strengthen the watermark that might have been weakened during the initial translation. Importantly, STEAM isn't tied to any specific watermarking technique, meaning it can be added to existing methods to improve their multilingual performance. It also works well with different ways of breaking down text into tokens and is easily adaptable to new languages.

Why it matters?

This research is important because it highlights a fairness issue in LLM watermarking. If watermarking only works reliably for a few popular languages, it creates a bias and could unfairly impact users or applications that rely on less common languages. STEAM offers a simple and effective way to make watermarking more consistent and reliable across a wider range of languages, promoting fairer and more trustworthy use of LLMs globally.

Abstract

Multilingual watermarking aims to make large language model (LLM) outputs traceable across languages, yet current methods still fall short. Despite claims of cross-lingual robustness, they are evaluated only on high-resource languages. We show that existing multilingual watermarking methods are not truly multilingual: they fail to remain robust under translation attacks in medium- and low-resource languages. We trace this failure to semantic clustering, which fails when the tokenizer vocabulary contains too few full-word tokens for a given language. To address this, we introduce STEAM, a back-translation-based detection method that restores watermark strength lost through translation. STEAM is compatible with any watermarking method, robust across different tokenizers and languages, non-invasive, and easily extendable to new languages. With average gains of +0.19 AUC and +40%p TPR@1% on 17 languages, STEAM provides a simple and robust path toward fairer watermarking across diverse languages.

View Paper