Epona

NEW

Paid Autonomous Driving Technology

LikeWebsite Promote

Key Features

Autoregressive diffusion world model

Localized spatiotemporal distribution modeling

Decoupled spatiotemporal factorization

Modular trajectory and video prediction

High-resolution, long-duration generation

Real-time motion planning

Minutes-long video generation

Trajectory-controlled video generation

Epona's world model utilizes a multimodal spatiotemporal transformer to process historical driving context and employs a next-frame prediction DiT to generate the frame at T+1 and a trajectory planning DiT to forecast the future N-frame pose trajectory. By adopting a chain-of-forward strategy, Epona enables high-quality and long-horizon video generation with an autoregressive manner. This approach allows for minutes-long video generation, trajectory-controlled video generation, and generalization to diverse driving scenes.

Epona's experimental results demonstrate state-of-the-art performance with 7.4% FVD improvement and minutes longer prediction duration compared to prior works. The learned world model further serves as a real-time motion planner, outperforming strong end-to-end planners on NAVSIM benchmarks. Epona's ability to understand real-world traffic knowledge and predict future trajectories makes it a promising solution for autonomous driving applications. Its modular architecture and chain-of-forward training strategy enable high-quality and long-horizon video generation, making it a valuable tool for researchers and developers.

Get more likes & reach the top of search results by adding this button on your site!

Epona

Key Features

Subscribe to the AI Search Newsletter