Frac-Connections: Fractional Extension of Hyper-Connections

Defa Zhu, Hongzhi Huang, Jundong Zhou, Zihao Huang, Yutao Zeng, Banggu Wu, Qiyang Min, Xun Zhou

2025-03-19

Frac-Connections: Fractional Extension of Hyper-Connections

Summary

This paper introduces Frac-Connections, a new technique to improve AI models, making them faster and more efficient.

What's the problem?

Existing methods to improve AI models can make them slower because they require more memory.

What's the solution?

Frac-Connections splits the data inside the AI model into smaller parts, reducing the amount of memory needed while still improving performance.

Why it matters?

This work matters because it allows for the creation of faster and more efficient AI models, which can be used in a variety of applications.

Abstract

Residual connections are central to modern deep learning architectures, enabling the training of very deep networks by mitigating gradient vanishing. Hyper-Connections recently generalized residual connections by introducing multiple connection strengths at different depths, thereby addressing the seesaw effect between gradient vanishing and representation collapse. However, Hyper-Connections increase memory access costs by expanding the width of hidden states. In this paper, we propose Frac-Connections, a novel approach that divides hidden states into multiple parts rather than expanding their width. Frac-Connections retain partial benefits of Hyper-Connections while reducing memory consumption. To validate their effectiveness, we conduct large-scale experiments on language tasks, with the largest being a 7B MoE model trained on up to 3T tokens, demonstrating that Frac-Connections significantly outperform residual connections.

View Paper