Doc-PP: Document Policy Preservation Benchmark for Large Vision-Language Models
Haeun Jang, Hwan Chang, Hwanhee Lee
2026-01-07
Summary
This paper investigates how well large AI models handle sensitive information when answering questions about documents that contain both text and images, especially when there are rules about what information they *can't* reveal.
What's the problem?
Current AI safety measures are good at preventing models from blurting out obvious secrets, but they struggle when answering questions requires the AI to really *think* about the document – combining information from images and text to figure things out. When the AI has to do more reasoning, it's much more likely to accidentally leak confidential details, even if it's been told not to. Simply giving the AI the relevant text snippets to work with, while helpful for understanding, actually makes it *easier* to leak information.
What's the solution?
The researchers created a new test, called Doc-PP, specifically designed to challenge AI models with these complex document-based questions and strict privacy rules. They then developed a new method called DVA, which stands for Decompose-Verify-Aggregation. DVA breaks down the question-answering process into separate steps: first, the AI figures out the answer, then it *checks* if revealing that answer would break any privacy rules, and finally, it combines information in a way that avoids leaks.
Why it matters?
This research is important because as AI gets better at understanding complex documents, like medical reports or legal files, it's crucial to ensure it can do so *safely* and respect privacy. The new test and DVA method provide a way to measure and improve the safety of these AI systems, making them more reliable for real-world applications where protecting sensitive information is essential.
Abstract
The deployment of Large Vision-Language Models (LVLMs) for real-world document question answering is often constrained by dynamic, user-defined policies that dictate information disclosure based on context. While ensuring adherence to these explicit constraints is critical, existing safety research primarily focuses on implicit social norms or text-only settings, overlooking the complexities of multimodal documents. In this paper, we introduce Doc-PP (Document Policy Preservation Benchmark), a novel benchmark constructed from real-world reports requiring reasoning across heterogeneous visual and textual elements under strict non-disclosure policies. Our evaluation highlights a systemic Reasoning-Induced Safety Gap: models frequently leak sensitive information when answers must be inferred through complex synthesis or aggregated across modalities, effectively circumventing existing safety constraints. Furthermore, we identify that providing extracted text improves perception but inadvertently facilitates leakage. To address these vulnerabilities, we propose DVA (Decompose-Verify-Aggregation), a structural inference framework that decouples reasoning from policy verification. Experimental results demonstrate that DVA significantly outperforms standard prompting defenses, offering a robust baseline for policy-compliant document understanding