PRvL: Quantifying the Capabilities and Risks of Large Language Models for PII Redaction

Leon Garza, Anantaa Kotal, Aritran Piplai, Lavanya Elluri, Prajit Das, Aman Chadha

2025-08-08

PRvL: Quantifying the Capabilities and Risks of Large Language Models
for PII Redaction

Summary

This paper talks about PRvL, a system that uses large language models to automatically find and hide personal information in texts to protect people's privacy.

What's the problem?

The problem is that personal information like names, emails, and locations can appear in documents and need to be carefully hidden to prevent misuse, but doing this accurately across different types of documents and fields is very challenging.

What's the solution?

The solution was to evaluate various language model designs and training methods to find the best ways to teach these models to identify and redact personal information effectively and efficiently, while keeping the meaning of the rest of the text intact.

Why it matters?

This matters because protecting personal information is crucial for privacy and safety, especially in healthcare, legal, and financial documents, and having trustworthy automated tools helps organizations handle sensitive data responsibly.

Abstract

A comprehensive analysis of Large Language Models for PII redaction evaluates various architectures and training strategies, providing guidance for accurate, efficient, and privacy-aware redaction systems.

View Paper