Bloom by Safety Research

Free Evaluation Model Testing

LikeWebsite Promote

Key Features

Scaffolded evaluation system driven by configurable 'seed' files.

Four-stage pipeline: Understanding, Ideation, Rollout, and Judgment.

Supports flexible configuration of target behaviors and exemplary transcripts.

Intelligent batching in the Ideation Agent for enhanced generation speed.

Supports extended thinking/reasoning effort levels for model analysis.

Interactive web-based viewer for browsing generated transcripts and scores.

Unified interface for interacting with various LLM providers via LiteLLM.

Option to hide or reveal the target model's identity during evaluation.

The operational flow of Bloom is structured into four distinct, sequential pipeline stages, ensuring a rigorous and comprehensive evaluation process. It begins with the Understanding Agent, which interprets the target behavior and examples to grasp the underlying scientific motivation. Following this, the Ideation Agent creatively generates diverse evaluation scenarios designed to elicit the target behavior, utilizing intelligent batching for efficiency. The Rollout Agent then executes these generated interactions against the specified target model. Finally, the Judgment and Meta-Judgment Agents rigorously score the outcomes for the target behavior and any configured secondary qualities, with the Meta-Judgment synthesizing a comprehensive report on the findings.

The system offers substantial flexibility and control over the evaluation process, catering to various research needs, from quick local debugging to large-scale experiments managed via Weights & Biases. Users configure the entire run through a central `seed.yaml` file, specifying parameters like the target model, evaluation diversity, maximum conversation length, and whether to use extended reasoning effort or web search capabilities during scenario generation. Furthermore, Bloom supports seamless integration with external tools, including an interactive web-based viewer for browsing results and utilizes LiteLLM for unified API interaction across multiple LLM providers, facilitating model comparisons and reproducibility.

Get more likes & reach the top of search results by adding this button on your site!

Bloom by Safety Research

Key Features

Zero to AI Engineer

Subscribe to the AI Search Newsletter