A core innovation of Cobra is its projector module, which bridges the gap between visual and language modalities. The projector, implemented as either a multi-layer perceptron or a lightweight downsampling module, transforms visual features into a format compatible with the Mamba backbone. This allows Cobra to concatenate visual and text embeddings, which are then processed through a stack of 64 Mamba blocks with residual connections and RMSNorm. The result is a model capable of robust autoregressive generation, producing natural language responses that are deeply informed by both image and text context. This design not only enhances the model’s efficiency but also its versatility across a wide range of multi-modal tasks, from detailed image captioning to complex question answering.


Cobra’s capabilities have been rigorously evaluated through a series of case studies and benchmarks, where it consistently outperforms leading models such as LLaVA v1.5 and MobileVLM v2. Notably, Cobra demonstrates superior understanding of spatial relationships in images and significantly reduces visual hallucinations, providing more accurate and contextually appropriate descriptions. For example, it can correctly identify object positions and describe intricate scenes, such as robotic arms manipulating blocks in simulated environments, where other models often fail. This makes Cobra particularly competitive in applications that demand precise visual reasoning, such as robotics, autonomous systems, and advanced content analysis.


Key features include:


  • Multi-modal architecture combining visual and textual understanding
  • Dual vision encoder using DINOv2 and SigLIP for rich feature extraction
  • Projector module for seamless alignment of visual and language tokens
  • Efficient Mamba backbone with 64 stacked blocks for scalable performance
  • Superior spatial reasoning and reduced visual hallucination in outputs

Get more likes & reach the top of search results by adding this button on your site!

Featured on

AI Search

11

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!