Instruction-Guided Autoregressive Neural Network Parameter Generation
Soro Bedionita, Bruno Andreis, Song Chong, Sung Ju Hwang
2025-04-04
Summary
This paper talks about IGPG, a method that creates AI model parts (like puzzle pieces) using instructions, helping AI quickly adapt to new tasks without starting from scratch.
What's the problem?
Current methods for building AI models struggle to handle big or complex designs, often making parts that don’t fit well together or work poorly for new tasks.
What's the solution?
IGPG uses a smart system that reads instructions (like 'build a cat detector') and past AI knowledge to generate well-fitting model parts layer by layer, ensuring everything works smoothly.
Why it matters?
This helps AI learn faster for tasks like image recognition or self-driving cars by reusing existing knowledge instead of retraining everything, saving time and energy.
Abstract
Learning to generate neural network parameters conditioned on task descriptions and architecture specifications is pivotal for advancing model adaptability and transfer learning. Existing methods especially those based on diffusion models suffer from limited scalability to large architectures, rigidity in handling varying network depths, and disjointed parameter generation that undermines inter-layer coherence. In this work, we propose IGPG (Instruction Guided Parameter Generation), an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. IGPG leverages a VQ-VAE and an autoregressive model to generate neural network parameters, conditioned on task instructions, dataset, and architecture details. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Operating at the token level, IGPG effectively captures complex parameter distributions aggregated from a broad spectrum of pretrained models. Extensive experiments on multiple vision datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework. The synthesized parameters achieve competitive or superior performance relative to state-of-the-art methods, especially in terms of scalability and efficiency when applied to large architectures. These results underscore ICPG potential as a powerful tool for pretrained weight retrieval, model selection, and rapid task-specific fine-tuning.