Yume

NEW

Free Vision World Generation

LikeWebsite Promote

Key Features

Interactive world generation from images, text, or videos

Camera motion quantization for stable training and user-friendly interaction

Masked Video Diffusion Transformer (MVDT) for infinite autoregressive generation

Training-free Anti-Artifact Mechanism (AAM) for enhanced visual quality

Time Travel Sampling based on Stochastic Differential Equations (TTS-SDE) for precise control

Synergistic optimization of adversarial distillation and caching mechanisms for model acceleration

High-fidelity and interactive video world generation

Trained on the high-quality world exploration dataset Sekai

Yume's technical framework includes several key components. Camera motion quantization translates camera trajectories into intuitive directional controls and rotational actions, mapped to keyboard input. The Masked Video Diffusion Transformer (MVDT) with frame memory enables infinite autoregressive generation, maintaining consistency across long sequences. Additionally, Yume uses training-free Anti-Artifact Mechanism (AAM) and Time Travel Sampling based on Stochastic Differential Equations (TTS-SDE) to enhance visual quality and control.

Yume is trained on the high-quality world exploration dataset Sekai and achieves remarkable results across diverse scenes and applications. The model's resources, including data, codebase, and model weights, are available on GitHub. Yume will update monthly to achieve its original goal of creating interactive, realistic, and dynamic worlds from various inputs. The model's potential applications include image and video editing, virtual reality, and more.

Get more likes & reach the top of search results by adding this button on your site!

Yume

Key Features

Subscribe to the AI Search Newsletter