HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Lei Xin, Yuhao Zheng, Ke Cheng, Changjiang Jiang, Zifan Zhang, Fanhu Zeng

2026-02-26

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Summary

This paper introduces a new recommendation model, HyTRec, designed to better predict what users will like based on their past actions, even when they have a very long history of interactions.

What's the problem?

Recommending items to users based on their behavior is tricky when you have a lot of data. Traditional methods either become too slow when considering everything a user has done, or they don't accurately capture what the user *currently* wants because they can't remember enough of the past. Efficient methods often sacrifice accuracy, while accurate methods are too computationally expensive for real-world applications with many users and items.

What's the solution?

HyTRec solves this by using two different types of attention. It uses a fast, but less precise, method for looking at the user’s entire history to understand their general preferences. Then, it uses a more accurate, but slower, method to focus on the user’s recent actions to catch any quick changes in what they’re interested in. To make sure the model doesn't get stuck on old information, it also includes a system that emphasizes newer behaviors and downplays older ones, ensuring the model stays up-to-date with the user's current intent.

Why it matters?

This research is important because it allows recommendation systems to handle very large amounts of user data without sacrificing speed or accuracy. This is especially useful for popular platforms with millions of users and items, leading to better recommendations and potentially increased user engagement, particularly for users with extensive interaction histories.

Abstract

Modeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from prohibitive computational overhead. To address this challenge, we propose HyTRec, a model featuring a Hybrid Attention architecture that explicitly decouples long-term stable preferences from short-term intent spikes. By assigning massive historical sequences to a linear attention branch and reserving a specialized softmax attention branch for recent interactions, our approach restores precise retrieval capabilities within industrial-scale contexts involving ten thousand interactions. To mitigate the lag in capturing rapid interest drifts within the linear layers, we furthermore design Temporal-Aware Delta Network (TADN) to dynamically upweight fresh behavioral signals while effectively suppressing historical noise. Empirical results on industrial-scale datasets confirm the superiority that our model maintains linear inference speed and outperforms strong baselines, notably delivering over 8% improvement in Hit Rate for users with ultra-long sequences with great efficiency.

View Paper