Powerful Controllable Inpainting & Generation
Overcoming the pain point of difficult post-generation adjustments in traditional video creation. VACE supports highly controllable video content inpainting and generation based on human pose, motion optical flow, structure preservation, spatial motion paths, video recoloring, and more. It also supports video generation based on reference subject images and background images, ensuring visual element consistency.
Unified & Powerful Multimodal Input System
Unlike traditional models that rely solely on text prompts, VACE has built a unified input system that integrates text, images (object reference images or video frames), videos (supporting regeneration after erasure or local extension), Masks (0/1 binary signals to specify editing areas), and various control signals (depth maps, optical flow, layout, grayscale images, line art, poses, etc.).
Precise Spatio-temporal Editing Capabilities
VACE empowers users with powerful capabilities for fine-grained video content editing. In the time dimension, it can intelligently complete the entire video duration based on any video segment or just the first and last frames. In the spatial dimension, it supports extended generation for image edges or background areas, such as achieving background replacement—changing the video background environment according to a new Prompt while keeping the main subject animasi.
Expert-Level Task Handling
VACE easily handles complex functions that traditionally require multiple expert models, such as image-referenced generation, video inpainting, and local editing.
Free Combination of Atomic Abilities
A revolutionary feature allowing the natural fusion of basic abilities like text-to-video, pose control, background replacement, without needing separate model training for each function.
Multiple Resolution Support
The open-sourced VACE-1.3B supports 480P, while VACE-14B supports both 480P and 720P resolutions, catering to various video quality needs.