A core-matching framework enhances inference efficiency in vision-language models by leveraging the synergy between token and neuron sparsity, outperforming baselines across multiple tasks and devices.

This paper talks about CoreMatching, a new way to make vision-language models, which are AI systems that understand both images and text, run much faster and more efficiently.

CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract