Ornith-1.0

NEW

Free Agents RL

LikeWebsite Promote

Key Features

Optimizes both task scaffolds and solution rollouts through reinforcement learning.

Lets task-specific orchestration strategies emerge instead of hand-coding every harness.

Propagates rollout reward back into the scaffold-authoring stage.

Uses immutable outer trust boundaries to constrain self-generated scaffolds.

Adds deterministic monitoring for forbidden environment or verifier access.

Uses a frozen LLM judge as an additional veto against intent-level reward hacking.

Applies pipeline RL for long asynchronous rollouts.

Uses staleness weighting to downweight older off-policy tokens.

The method treats scaffold generation as part of the policy rather than a fixed hand-engineered harness. Reward from downstream rollouts is propagated back to scaffold construction, while safeguards such as immutable outer trust boundaries, deterministic monitors, and a frozen LLM judge reduce reward hacking.

Ornith-1.0 is useful for researchers studying agentic reinforcement learning, scaffold search, coding agents, and long-rollout optimization. The project highlights pipeline RL for asynchronous training and staleness weighting to control off-policy token effects during long trajectories.

Get more likes & reach the top of search results by adding this button on your site!

Ornith-1.0

Key Features

Zero to AI Engineer

Subscribe to the AI Search Newsletter