< Explain other AI papers

mHC: Manifold-Constrained Hyper-Connections

Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, Wenfeng Liang

2026-01-01

mHC: Manifold-Constrained Hyper-Connections

Summary

This paper introduces a new way to improve the performance of neural networks, building on recent advances called Hyper-Connections. It tackles issues that arise when making these networks bigger and more complex.

What's the problem?

Recent improvements to neural networks, like Hyper-Connections, have focused on making the connections between layers wider and more varied. While this boosts performance, it messes with a key feature of the original design – the 'identity mapping' – which helps with stable training. Losing this property makes it harder to train very large networks and also requires more memory to operate.

What's the solution?

The researchers propose a method called Manifold-Constrained Hyper-Connections, or mHC. This technique essentially 'reshapes' the connections in Hyper-Connections to bring back the important 'identity mapping' property. They also optimized the system to make it more efficient, so it doesn't use up too much memory. Essentially, they're fixing the instability and memory issues of Hyper-Connections.

Why it matters?

This work is important because it allows for the creation of even larger and more powerful neural networks that can be trained reliably. It provides a better understanding of how to design the structure of these networks and suggests new avenues for improving the foundational models that power many AI applications.

Abstract

Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.