Learned Compression for Compressed Learning
Dan Jacobellis, Neeraja J. Yadwadkar
2024-12-13
Summary
This paper discusses a new method called WaLLoC (Wavelet Learned Lossy Compression) that improves how we compress data for machine learning, making it easier to work with high-resolution information from modern sensors.
What's the problem?
As technology advances, sensors produce a lot of high-resolution data, but machine learning systems often throw away most of this information to save resources. Current compression methods are not effective for learning because they either lose important details or don't reduce data size efficiently, which can lead to poor performance in machine learning tasks.
What's the solution?
WaLLoC addresses these issues by combining linear coding methods with advanced autoencoders that reduce data size while preserving important details. It uses a technique called the wavelet packet transform to better organize the data before compression. This allows WaLLoC to maintain high quality while significantly reducing the amount of data that needs to be processed. The model is efficient enough to be used on devices with limited resources, such as mobile phones and remote sensors, and it can handle various tasks like image classification and music separation.
Why it matters?
This research is important because it enables more effective use of high-resolution data in machine learning applications. By improving how we compress and process this information, WaLLoC can enhance the performance of AI systems across different fields, making them faster and more efficient while maintaining high quality.
Abstract
Modern sensors produce increasingly rich streams of high-resolution data. Due to resource constraints, machine learning systems discard the vast majority of this information via resolution reduction. Compressed-domain learning allows models to operate on compact latent representations, allowing higher effective resolution for the same budget. However, existing compression systems are not ideal for compressed learning. Linear transform coding and end-to-end learned compression systems reduce bitrate, but do not uniformly reduce dimensionality; thus, they do not meaningfully increase efficiency. Generative autoencoders reduce dimensionality, but their adversarial or perceptual objectives lead to significant information loss. To address these limitations, we introduce WaLLoC (Wavelet Learned Lossy Compression), a neural codec architecture that combines linear transform coding with nonlinear dimensionality-reducing autoencoders. WaLLoC sandwiches a shallow, asymmetric autoencoder and entropy bottleneck between an invertible wavelet packet transform. Across several key metrics, WaLLoC outperforms the autoencoders used in state-of-the-art latent diffusion models. WaLLoC does not require perceptual or adversarial losses to represent high-frequency detail, providing compatibility with modalities beyond RGB images and stereo audio. WaLLoC's encoder consists almost entirely of linear operations, making it exceptionally efficient and suitable for mobile computing, remote sensing, and learning directly from compressed data. We demonstrate WaLLoC's capability for compressed-domain learning across several tasks, including image classification, colorization, document understanding, and music source separation. Our code, experiments, and pre-trained audio and image codecs are available at https://ut-sysml.org/walloc