MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models

Kailin Jiang, Ning Jiang, Yuchen Ren, Yuchen Li, Yifan Gao, Jinhe Bi, Yunpu Ma, Qingqing Liu, Xianhao Wang, Yifan Jia, Hongbo Jiang, Yaocong Hu, Bin Li, Lei Liu, Yuntao Du

2025-10-23

MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models

Summary

This paper investigates how well Large Multimodal Models (LMMs) – AI systems that understand both text and images – handle facts that change over time, like who won a sports game last year versus this year.

What's the problem?

LMMs are trained on a lot of information, but that information is usually a snapshot in time. Once that information becomes outdated, the models struggle to give correct answers. Current tests used to evaluate these models don't really focus on this 'time-sensitive' knowledge, so it's hard to know how well they're actually doing with facts that change. Basically, existing benchmarks aren't challenging them enough on keeping up with current events.

What's the solution?

The researchers created a new, more thorough test called MINED. This test looks at how LMMs handle time-sensitive information in six different ways – things like understanding when something happened, knowing if information is trustworthy, and reasoning about changes over time. MINED includes over 2,000 questions covering different types of knowledge, and they tested 15 popular LMMs with it. They also explored if they could 'update' the models with new information using a technique called knowledge editing.

Why it matters?

This work is important because as LMMs become more common, we need to be sure they're giving us accurate information, and that includes information that changes frequently. Knowing how well these models handle time is crucial for using them reliably in real-world applications, and the new MINED benchmark provides a better way to measure and improve their performance in this area. It also shows that updating these models with new facts is possible, which is a step towards keeping them current.

Abstract

Large Multimodal Models (LMMs) encode rich factual knowledge via cross-modal pre-training, yet their static representations struggle to maintain an accurate understanding of time-sensitive factual knowledge. Existing benchmarks remain constrained by static designs, inadequately evaluating LMMs' ability to understand time-sensitive knowledge. To address this gap, we propose MINED, a comprehensive benchmark that evaluates temporal awareness along 6 key dimensions and 11 challenging tasks: cognition, awareness, trustworthiness, understanding, reasoning, and robustness. MINED is constructed from Wikipedia by two professional annotators, containing 2,104 time-sensitive knowledge samples spanning six knowledge types. Evaluating 15 widely used LMMs on MINED shows that Gemini-2.5-Pro achieves the highest average CEM score of 63.07, while most open-source LMMs still lack time understanding ability. Meanwhile, LMMs perform best on organization knowledge, whereas their performance is weakest on sport. To address these challenges, we investigate the feasibility of updating time-sensitive knowledge in LMMs through knowledge editing methods and observe that LMMs can effectively update knowledge via knowledge editing methods in single editing scenarios.

View Paper