I2VControl: Disentangled and Unified Video Motion Synthesis Control

Wanquan Feng1      Tianhao Qi1,2      Jiawei Liu1      Mingzhen Sun1,3      Pengqi Tu1      Tianxiang Ma1      Fei Dai1      Songtao Zhao1      Siyu Zhou1      Qian He1

1ByteDance China     2University of Science and Technology of China     3Institute of automation, Chinese academy of science

We propose I2VControl, an all-in-one unified framework for image-to-video motion synthesis control. In the illustration, we show several scenarios of disentangled controls, including camera movement (camera dollies in and gets closer to the sculpture), object movement (the astronaut walks forward) and motion brush (smoke flows in the wind, with a given motion strength value). Users can select the control modes according to their requirements, where the control modes can be combined without conflict.


In the following samples, we drag the movable objects (both translation/rotation; both single/multiple objects) in the input image:

Input Image Object Dragging Result

Camera Movement

In the following samples, we only move the camera for the input sample:

Input Image Camera Movement Result

Camera Movement + Dragging

In the following samples, we move the camera and drag the movable objects in the input image:

Input Image Camera & Object Movement Result

Motion Brush for Movable Objects

In the following samples, we brush a mask for movable objects in the input image and only set a scalar motion strength:

Input Image Brush Mask Result

Motion Brush for Visual Effects

In the following samples, we brush a mask for fluids in the input image and only set a scalar motion strength:

Input Image Brush Mask Result

Comprehensive Usage for Creation

【Union】We use dragging, camera movement, and motion brush in one single sample. We mask the brush-unit as red and mask the drag-unit as green.

Input Image & Mask Controls Result

【Hitchcock】We fix the foreground and dolly out the background, creating a Hitchcock-like camera movement effect.

Input Image Controls Result

【Surrounding Character】We move the camera around a character and always keep the main character in place, which creates a beautiful dynamic portrait video.

Input Image Controls Result

【Flowing Hair】We consider the whole image as borderland, set a motion strength, and set the prompt as "flowing in the wind". Then the artistic portraits can have some dynamics.

Input Image Result Input Image Result


    author    = {Wanquan Feng and Tianhao Qi and Jiawei Liu and Mingzhen Sun and Pengqi Tu and Tianxiang Ma and Fei Dai and Songtao Zhao and Siyu Zhou and Qian He},
    title     = {I2VControl: Disentangled and Unified Video Motion Synthesis Control},
    booktitle = {arxiv},
    year      = {2024}