Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries

Samitha Nuwan Thilakarathna, Ercan Avsar, Martin Mathias Nielsen, Malte Pedersen

2025-12-16

Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries

Summary

This research focuses on automatically identifying individual fish in video footage collected by electronic monitoring systems used in fisheries, a task that's becoming increasingly important as we gather more and more video data.

What's the problem?

Fisheries management relies on knowing how many fish are present and tracking individual fish over time, but manually reviewing all the video footage from electronic monitoring systems is impossible due to the sheer volume of data. Identifying the same fish across different parts of a video is difficult, especially when dealing with species that look very similar to each other and when the fish are viewed from different angles or partially hidden.

What's the solution?

The researchers developed a computer system using 'deep learning' – specifically a type of system called a Vision Transformer – to automatically re-identify fish in video. They created a simulated dataset of fish on a conveyor belt to train and test their system. They improved the system's accuracy by carefully selecting which examples the system learns from (a technique called 'hard triplet mining') and by preparing the images in a specific way to account for differences in lighting and color. They found their system, Swin-T, worked better than a more traditional image recognition system, ResNet-50.

Why it matters?

This work is important because it provides a way to automatically analyze large amounts of fisheries video data, which can lead to better understanding of fish populations and more effective, sustainable management of marine resources. Being able to track individual fish helps scientists understand their behavior and movement patterns, ultimately contributing to healthier oceans.

Abstract

Accurate fisheries data are crucial for effective and sustainable marine resource management. With the recent adoption of Electronic Monitoring (EM) systems, more video data is now being collected than can be feasibly reviewed manually. This paper addresses this challenge by developing an optimized deep learning pipeline for automated fish re-identification (Re-ID) using the novel AutoFish dataset, which simulates EM systems with conveyor belts with six similarly looking fish species. We demonstrate that key Re-ID metrics (R1 and mAP@k) are substantially improved by using hard triplet mining in conjunction with a custom image transformation pipeline that includes dataset-specific normalization. By employing these strategies, we demonstrate that the Vision Transformer-based Swin-T architecture consistently outperforms the Convolutional Neural Network-based ResNet-50, achieving peak performance of 41.65% mAP@k and 90.43% Rank-1 accuracy. An in-depth analysis reveals that the primary challenge is distinguishing visually similar individuals of the same species (Intra-species errors), where viewpoint inconsistency proves significantly more detrimental than partial occlusion. The source code and documentation are available at: https://github.com/msamdk/Fish_Re_Identification.git

View Paper