NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning
Zhi Zhang, Yixian Shen, Congfeng Cao, Ekaterina Shutova
2025-10-23
Summary
This paper introduces a new method called NeuroAda for efficiently updating large AI models for specific tasks, aiming to get the best of both worlds: strong performance and low memory usage.
What's the problem?
Currently, when you want to customize a big AI model for a new job, you have two main options. One is to add extra parts to the model, which saves memory but isn't very flexible. The other is to directly change parts of the original model, which is more effective but uses a lot more memory. This creates a trade-off between how well the model adapts and how much computer power you need to do it.
What's the solution?
NeuroAda solves this by first figuring out which parts of the original model are most important for the new task. Then, instead of changing those important parts directly, it adds 'bypass connections' around them. During customization, only these new bypass connections are updated, leaving the original model untouched. This allows for precise adaptation without needing to store lots of extra information or modify the core model itself.
Why it matters?
This is important because it allows researchers and developers to fine-tune very large AI models on a wider range of tasks, even with limited computing resources. NeuroAda achieves top-notch results while using a tiny fraction of the parameters that would normally need to be updated, and significantly reduces the memory needed, making advanced AI more accessible.
Abstract
Existing parameter-efficient fine-tuning (PEFT) methods primarily fall into two categories: addition-based and selective in-situ adaptation. The former, such as LoRA, introduce additional modules to adapt the model to downstream tasks, offering strong memory efficiency. However, their representational capacity is often limited, making them less suitable for fine-grained adaptation. In contrast, the latter directly fine-tunes a carefully chosen subset of the original model parameters, allowing for more precise and effective adaptation, but at the cost of significantly increased memory consumption. To reconcile this trade-off, we propose NeuroAda, a novel PEFT method that enables fine-grained model finetuning while maintaining high memory efficiency. Our approach first identifies important parameters (i.e., connections within the network) as in selective adaptation, and then introduces bypass connections for these selected parameters. During finetuning, only the bypass connections are updated, leaving the original model parameters frozen. Empirical results on 23+ tasks spanning both natural language generation and understanding demonstrate that NeuroAda achieves state-of-the-art performance with as little as leq 0.02% trainable parameters, while reducing CUDA memory usage by up to 60%. We release our code here: https://github.com/FightingFighting/NeuroAda.git.