FlexiAct's pipeline consists of two main components: RefAdapter and FAE (Frequency-aware Action Extraction). RefAdapter is trained to condition arbitrary frames to enable transitions across diverse spatial structures, while FAE is designed to directly achieve action extraction during the denoising process. FAE's attention weights of video tokens to the frequency-aware embedding are dynamically adjusted based on timesteps, facilitating action extraction. This approach allows FlexiAct to effectively transfer actions to subjects with diverse layouts, skeletons, and viewpoints, making it a versatile tool for video generation.


FlexiAct's results demonstrate its effectiveness in transferring actions to subjects with diverse layouts, skeletons, and viewpoints. The method is able to generate high-quality videos with precise action control, spatial structure adaptation, and consistency preservation. FlexiAct's ability to handle heterogeneous scenarios makes it a valuable tool for applications such as video editing, animation, and virtual reality. With its powerful features and versatility, FlexiAct has the potential to revolutionize the field of video generation and editing.

Key Features

Action control in heterogeneous scenarios
Variations in layout, viewpoint, and skeletal structure
RefAdapter: lightweight image-conditioned adapter
FAE: Frequency-aware Action Extraction
Precise action control, spatial structure adaptation, and consistency preservation
Effective transfer of actions to subjects with diverse layouts, skeletons, and viewpoints
High-quality video generation
Versatile tool for video editing, animation, and virtual reality

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!