SAM 3 supports a variety of prompt modalities, including both concept prompts such as simple noun phrases and image exemplars, as well as visual prompts, such as masks, boxes, and points. This increases the flexibility and usability of segmentation, particularly for concepts that are rare or hard to describe with text alone. SAM 3 excels at segmenting objects described by short noun phrases, reflecting common user intent in interactive and natural settings. Our model can also be used as a perception tool for multimodal large language models to segment objects described by more complex prompts.
SAM 3 has been applied to various use cases, including scientific fields, such as wildlife monitoring and ocean exploration. The model has also been integrated with wearable devices, enabling robust segmentation and tracking of objects from a first-person perspective. Additionally, SAM 3 has been used to build a novel data engine that leverages AI and human annotators, allowing for dramatic speed-ups in annotation. This hybrid human and AI system has enabled the creation of a large and diverse training set with over 4 million unique concepts.

