CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models
Qinsi Wang, Hancheng Ye, Ming-Yu Chung, Yudong Liu, Yueqian Lin, Martin Kuo, Mingyuan Ma, Jianyi Zhang, Yiran Chen
2025-05-28
Summary
This paper talks about CoreMatching, a new way to make vision-language models, which are AI systems that understand both images and text, run much faster and more efficiently.
What's the problem?
The problem is that these models are usually very large and slow because they process a lot of information at once, which makes them hard to use on regular computers or phones and can waste a lot of energy.
What's the solution?
To solve this, the researchers designed a framework that smartly cuts down on the amount of information the model has to process by removing unnecessary tokens (pieces of text or image data) and neurons (parts of the AI's brain). This co-adaptive pruning makes the model lighter and quicker without losing accuracy.
Why it matters?
This is important because it allows powerful AI models to be used on more devices and in more situations, making them more accessible and practical for everyday use, from apps to smart devices.
Abstract
A core-matching framework enhances inference efficiency in vision-language models by leveraging the synergy between token and neuron sparsity, outperforming baselines across multiple tasks and devices.