We propose I2VControl-Camera, a novel camera control method for image-to-video generation, offering high control precision and adjustable motion strength.
Before the method and analysis, let’s first look at some visual results! For each sample, we manually set the camera movement and adjust it to a suitable motion strength value. The first column is the original input image, the second column is the camera motion trajectory, and the third column is the generated result.
Input | Movement | Result |
---|---|---|
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
Users can control the camera movement with high precision by converting the camera movement into point trajectories and then executing the control process. We lift the input image from 2D to 3D as a RGBD point cloud. When the camera moves, the 3D points can be considered as moving in the camera coordinate system. Then we project them onto 2D according to current camera pose to obtain the 2D point trajectory.
Moreover, we apply a scalar value to control the motion strength of the subjects in the video, which is decoupled from the camera movement.
The follwing figure shows two samples: the top one demonstrates a pan-left camera movement, while the bottom one shows the camera sliding to the right. For each sample, we show a preview (directly render the RGBD point cloud on to 2D plane according to the extrinsic matrix) and our generated result. We can see that the generated result can almost follow the control signal at the pixel level (can be seen in the green boxes) even when there exists movable object (the cat in the red box).
We test the same camera control signal with different motion strength value. When the motion strength is set as 0, the entire scene is nearly static even when there are movable objects in the figure (polar bear, astronaut, wolf); when the motion strength is large, the main objects moves obviously.
We show our camera control results with ground truth preview here, which demonstrates our pixel-level control capabilities. We also list the results of the comparing methods for the qualitative comparison. We can observe that our control precision is significantly higher than that of comparative methods.
Input & GT Preview | CameraCtrl | MotionCtrl | Ours |
---|---|---|---|
![]() |
|||
![]() |
|||
![]() |
The following samples contain combinations of multiple camera movements.
Input & GT Preview | CameraCtrl | MotionCtrl | Ours | |
---|---|---|---|---|
move left + pan right | ![]() |
|||
rotate + move up + tilt down | ![]() |
|||
rotate + zoom in | ![]() |
The following samples contain multiple dynamic objects, where our method can still achieve precise control and natural dynamics.
Input & GT Preview | CameraCtrl | MotionCtrl | Ours |
---|---|---|---|
![]() |
|||
![]() |
|||
![]() |
We show the results under different motion strength. It is evident that as the motion strength increases, the amplitude of the motions enlarged and shows a direct positive correlation with the set values of motion strength.
Input & GT Preview | MS=0 | MS=200 | MS=600 |
---|---|---|---|
![]() |
|||
![]() |
|||
![]() |
We present some results on another base model, Seaweed, where the results demonstrates the applicability of our method to any base model.
Pan | Zoom | Tilt | Rotate |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
@article{i2vcontrolcamera, title={I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength}, author={Feng, Wanquan and Liu, Jiawei and Tu, Pengqi and Qi, Tianhao and Sun, Mingzhen and Ma, Tianxiang and Zhao, Songtao and Zhou, Siyu and He, Qian}, booktitle={The Tenth International Conference on Learning Representations, (ICLR)}, year={2025} }