Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Ido Galil, Moshe Kimhi, Ran El-Yaniv

2026-04-20

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Summary

This paper investigates a surprising weakness in deep learning models: they can be easily broken by making very small changes to the numbers that define them. Specifically, flipping just a few bits within the model's parameters can cause a huge drop in performance.

What's the problem?

Deep Neural Networks, despite their power, are surprisingly fragile. Even tiny alterations to the core numbers that make up the network – like changing a single bit representing a positive or negative sign – can completely ruin their ability to make accurate predictions. This is a security concern because someone could intentionally sabotage a model with minimal effort, and it raises questions about why these models are so sensitive.

What's the solution?

The researchers developed a method called Deep Neural Lesion (DNL) to pinpoint exactly *which* numbers within a network are most critical. It works without needing any training data or complex optimization processes. They then improved this method with 1P-DNL, which uses a single quick calculation to further refine the selection of these critical numbers. They tested this on various types of AI – image classifiers, object detectors, and even large language models – to see where these vulnerabilities lie.

Why it matters?

This research is important because it highlights a significant security flaw in current AI systems. Knowing which parts of a model are most vulnerable allows us to develop ways to protect them, like selectively safeguarding those critical bits. This could be crucial for ensuring the reliability and safety of AI used in important applications, from self-driving cars to medical diagnosis.

Abstract

Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits. We introduce Deep Neural Lesion (DNL), a data-free and optimizationfree method that locates critical parameters, and an enhanced single-pass variant, 1P-DNL, that refines this selection with one forward and backward pass on random inputs. We show that this vulnerability spans multiple domains, including image classification, object detection, instance segmentation, and reasoning large language models. In image classification, flipping just two sign bits in ResNet-50 on ImageNet reduces accuracy by 99.8%. In object detection and instance segmentation, one or two sign flips in the backbone collapse COCO detection and mask AP for Mask R-CNN and YOLOv8-seg models. In language modeling, two sign flips into different experts reduce Qwen3-30B-A3B-Thinking from 78% to 0% accuracy. We also show that selectively protecting a small fraction of vulnerable sign bits provides a practical defense against such attacks.

View Paper