FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework
Lukas Meyer, Andreas Gilson, Ute Schmidt, Marc Stamminger
2024-08-13

Summary
This paper presents FruitNeRF, a new framework that uses advanced technology to count fruits in images and videos accurately in 3D.
What's the problem?
Counting fruits in images can be difficult because traditional methods might not work well with different types of fruit or may lead to mistakes like double counting. Many existing systems rely on tracking objects or motion, which can be unreliable.
What's the solution?
FruitNeRF uses a combination of advanced techniques to improve fruit counting. It takes multiple images of fruits from different angles and segments them using a model that can identify any type of fruit. By creating a 3D representation of the fruits, it can accurately count them without making errors. The system was tested with both real and synthetic datasets, showing that it performs better than older methods.
Why it matters?
This research is important because it provides a more reliable way to count fruits, which can help farmers and researchers monitor crops more effectively. Accurate fruit counting can improve agricultural practices and ensure better yields, ultimately contributing to food security.
Abstract
We introduce FruitNeRF, a unified novel fruit counting framework that leverages state-of-the-art view synthesis methods to count any fruit type directly in 3D. Our framework takes an unordered set of posed images captured by a monocular camera and segments fruit in each image. To make our system independent of the fruit type, we employ a foundation model that generates binary segmentation masks for any fruit. Utilizing both modalities, RGB and semantic, we train a semantic neural radiance field. Through uniform volume sampling of the implicit Fruit Field, we obtain fruit-only point clouds. By applying cascaded clustering on the extracted point cloud, our approach achieves precise fruit count.The use of neural radiance fields provides significant advantages over conventional methods such as object tracking or optical flow, as the counting itself is lifted into 3D. Our method prevents double counting fruit and avoids counting irrelevant fruit.We evaluate our methodology using both real-world and synthetic datasets. The real-world dataset consists of three apple trees with manually counted ground truths, a benchmark apple dataset with one row and ground truth fruit location, while the synthetic dataset comprises various fruit types including apple, plum, lemon, pear, peach, and mango.Additionally, we assess the performance of fruit counting using the foundation model compared to a U-Net.