Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, Vinay Kumar Sankarapu

2025-11-06

Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

Summary

This paper introduces a new neural network architecture called Orion-MSP designed to better handle and learn from data organized in tables, like spreadsheets or databases.

What's the problem?

Currently, neural networks struggle with tabular data because the data often contains different types of information and complex relationships between the data points. Existing methods for 'in-context learning' with tables, where the model learns directly from examples without needing specific training, have limitations. They often only look at features at one level of detail, become slow with large tables due to how they pay attention to the data, and process information in a single direction, preventing them from refining their understanding through back-and-forth communication between different parts of the model.

What's the solution?

The researchers developed Orion-MSP, which tackles these problems in three ways. First, it analyzes features at multiple levels to capture more complex relationships. Second, it uses a smarter way of paying attention to the data, focusing on important sections instead of everything at once, making it faster and able to handle larger tables. Finally, it allows information to flow back and forth between different parts of the model, allowing it to continuously improve its understanding of the data.

Why it matters?

Orion-MSP performs as well as, or even better than, traditional methods like gradient-boosted trees, which are commonly used for tabular data, but does so without needing task-specific training. This is important because it offers a more efficient and scalable way to analyze tabular data, setting a new benchmark for how well neural networks can perform on this type of information.

Abstract

Tabular data remain the predominant format for real-world applications. Yet, developing effective neural models for tabular data remains challenging due to heterogeneous feature types and complex interactions occurring at multiple scales. Recent advances in tabular in-context learning (ICL), such as TabPFN and TabICL, have achieved state-of-the-art performance comparable to gradient-boosted trees (GBTs) without task-specific fine-tuning. However, current architectures exhibit key limitations: (1) single-scale feature processing that overlooks hierarchical dependencies, (2) dense attention with quadratic scaling in table width, and (3) strictly sequential component processing that prevents iterative representation refinement and cross-component communication. To address these challenges, we introduce Orion-MSP, a tabular ICL architecture featuring three key innovations: (1) multi-scale processing to capture hierarchical feature interactions; (2) block-sparse attention combining windowed, global, and random patterns for scalable efficiency and long-range connectivity; and (3) a Perceiver-style memory enabling safe bidirectional information flow across components. Across diverse benchmarks, Orion-MSP matches or surpasses state-of-the-art performance while scaling effectively to high-dimensional tables, establishing a new standard for efficient tabular in-context learning. The model is publicly available at https://github.com/Lexsi-Labs/Orion-MSP .

View Paper