MACT, a Multi-Agent Collaboration framework with Test-Time scaling, enhances visual document understanding and VQA by using four specialized agents and mixed reward modeling, achieving superior performance with reduced parameters.

This paper talks about MACT, a system where multiple AI agents work together to understand visual documents and answer questions about them more effectively.

Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract