Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Roman Beliy, Amit Zalcher, Jonathan Kogman, Navve Wasserman, Michal Irani

2025-11-05

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Summary

This paper presents a new method, called Brain-IT, for reconstructing images people are looking at directly from their brain activity measured by fMRI. It's like reading minds, but instead of thoughts, it recreates what someone *saw*.

What's the problem?

Currently, reconstructing images from brain scans isn't perfect. While recent advances using 'diffusion models' are promising, the recreated images often don't accurately reflect the original images a person viewed. They might get the general idea, but lack important details or look blurry and inaccurate.

What's the solution?

Brain-IT tackles this by using a 'Brain Interaction Transformer' or BIT. This system identifies groups of brain areas that work together when processing visual information, and these groups are consistent across different people. BIT then predicts key features of the image – both the overall meaning (like 'a cat') and the basic structure (like where objects are positioned) – to guide the diffusion model in creating a more faithful reconstruction. Importantly, the system is designed to learn efficiently, needing less brain scan data than previous methods.

Why it matters?

This research is important because it significantly improves our ability to understand how the brain represents visual information. Better image reconstruction means we can potentially decode what someone is experiencing visually, which could have applications in understanding consciousness, communication with people who can't speak, or even helping people with visual impairments. The fact that it works well with limited data is also a big step forward, making this technology more accessible.

Abstract

Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present "Brain-IT", a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional-clusters are shared by all subjects, serving as building blocks for integrating information both within and across brains. All model components are shared by all clusters & subjects, allowing efficient training with a limited amount of data. To guide the image reconstruction, BIT predicts two complementary localized patch-level image features: (i)high-level semantic features which steer the diffusion model toward the correct semantic content of the image; and (ii)low-level structural features which help to initialize the diffusion process with the correct coarse layout of the image. BIT's design enables direct flow of information from brain-voxel clusters to localized image features. Through these principles, our method achieves image reconstructions from fMRI that faithfully reconstruct the seen images, and surpass current SotA approaches both visually and by standard objective metrics. Moreover, with only 1-hour of fMRI data from a new subject, we achieve results comparable to current methods trained on full 40-hour recordings.

View Paper