LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios
Jiahao Chen, Zhiyuan Huang, Yurou Liu, Bing Su
2025-09-15
Summary
This paper focuses on a type of machine learning called long-tailed learning, which deals with situations where some categories have a lot of examples and others have very few. It builds on a technique called semi-supervised learning, which uses both labeled and unlabeled data to improve accuracy.
What's the problem?
Existing methods for long-tailed semi-supervised learning often require training a model from the very beginning. This can lead to the model being overly confident in its predictions, even when it's wrong, and creating inaccurate 'pseudo-labels' from the unlabeled data. Also, real-world data isn't always clean; sometimes unlabeled data contains examples that don't fit the categories the model is learning, which can confuse it.
What's the solution?
The researchers propose a new framework called LoFT, which takes advantage of powerful 'foundation models' – pre-trained models that already have a lot of general knowledge. Instead of training from scratch, LoFT fine-tunes these foundation models, which results in more reliable pseudo-labels and better performance on imbalanced datasets. They also created LoFT-OW to specifically handle situations where the unlabeled data contains examples from outside the known categories, improving the model's ability to distinguish between what it knows and what it doesn't.
Why it matters?
This work is important because it shows that using pre-trained foundation models can significantly improve long-tailed learning, especially when you don't have a lot of labeled data. It also addresses a practical challenge of real-world data by handling situations where the unlabeled data isn't perfect, making the approach more useful in realistic applications. Importantly, it achieves better results using far less unlabeled data than previous methods.
Abstract
Long-tailed learning has garnered increasing attention due to its wide applicability in real-world scenarios. Among existing approaches, Long-Tailed Semi-Supervised Learning (LTSSL) has emerged as an effective solution by incorporating a large amount of unlabeled data into the imbalanced labeled dataset. However, most prior LTSSL methods are designed to train models from scratch, which often leads to issues such as overconfidence and low-quality pseudo-labels. To address these challenges, we extend LTSSL into the foundation model fine-tuning paradigm and propose a novel framework: LoFT (Long-tailed semi-supervised learning via parameter-efficient Fine-Tuning). We demonstrate that fine-tuned foundation models can generate more reliable pseudolabels, thereby benefiting imbalanced learning. Furthermore, we explore a more practical setting by investigating semi-supervised learning under open-world conditions, where the unlabeled data may include out-of-distribution (OOD) samples. To handle this problem, we propose LoFT-OW (LoFT under Open-World scenarios) to improve the discriminative ability. Experimental results on multiple benchmarks demonstrate that our method achieves superior performance compared to previous approaches, even when utilizing only 1\% of the unlabeled data compared with previous works.