Neural Metamorphosis

Xingyi Yang, Xinchao Wang

2024-10-17

Summary

This paper introduces Neural Metamorphosis (NeuMeta), a new approach to creating flexible neural networks that can adapt to different sizes and architectures without needing to be retrained.

What's the problem?

Traditionally, building neural networks for specific tasks means creating separate models for different sizes or types. This can be inefficient and time-consuming, as it requires retraining each model from scratch whenever changes are needed. Additionally, existing methods often struggle to adapt to new configurations that weren't part of the original training.

What's the solution?

To solve this problem, NeuMeta learns a 'weight manifold,' which is a mathematical space where all possible configurations of a neural network's weights exist. With this approach, once the model is trained, it can generate weights for any network size or configuration directly from this manifold without needing retraining. The authors also improve the stability of this weight space using techniques like permuting weight matrices and adding noise during training to ensure consistent performance across different model sizes.

Why it matters?

This research is significant because it allows for much more flexibility in how neural networks are built and used. By enabling a single model to adapt to various configurations efficiently, NeuMeta can save time and resources in developing AI systems. This advancement could lead to better performance in applications like image classification, segmentation, and generation, making AI technology more accessible and effective.

Abstract

This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks. Contrary to crafting separate models for different architectures or sizes, NeuMeta directly learns the continuous weight manifold of neural networks. Once trained, we can sample weights for any-sized network directly from the manifold, even for previously unseen configurations, without retraining. To achieve this ambitious goal, NeuMeta trains neural implicit functions as hypernetworks. They accept coordinates within the model space as input, and generate corresponding weight values on the manifold. In other words, the implicit function is learned in a way, that the predicted weights is well-performed across various models sizes. In training those models, we notice that, the final performance closely relates on smoothness of the learned manifold. In pursuit of enhancing this smoothness, we employ two strategies. First, we permute weight matrices to achieve intra-model smoothness, by solving the Shortest Hamiltonian Path problem. Besides, we add a noise on the input coordinates when training the implicit function, ensuring models with various sizes shows consistent outputs. As such, NeuMeta shows promising results in synthesizing parameters for various network configurations. Our extensive tests in image classification, semantic segmentation, and image generation reveal that NeuMeta sustains full-size performance even at a 75% compression rate.

View Paper