VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning

Md. Adnanul Islam, Md. Faiyaz Abdullah Sayeedi, Md. Asaduzzaman Shuvo, Muhammad Ziaur Rahman, Shahanur Rahman Bappy, Raiyan Rahman, Swakkhar Shatabda

2025-06-18

VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based
Mosquito Breeding Site Detection and Reasoning

Summary

This paper talks about VisText-Mosquito, a collection of images and text that helps AI models detect and understand places where mosquitoes breed, by combining pictures and descriptions for better analysis.

What's the problem?

The problem is that finding and understanding mosquito breeding sites in the environment is difficult, but it’s important to control mosquitoes that spread diseases, and previous datasets didn’t combine enough visual and textual information for accurate automated detection.

What's the solution?

The researchers created a new dataset with both images and text and used advanced AI models like YOLO and BLIP to train systems that can automatically find, segment, and reason about mosquito breeding sites better than before.

Why it matters?

This matters because improving how AI detects mosquito breeding areas can help in fighting mosquito-borne diseases by supporting better monitoring and prevention efforts, protecting public health.

Abstract

VisText-Mosquito is a multimodal dataset combining visual and textual data for automated mosquito breeding site detection, segmentation, and reasoning, utilizing YOLOv9s, YOLOv11n-Seg, and a fine-tuned BLIP model.

View Paper