I2VControl-Camera

Table of Contents (Click for quick navigation)

🌼 Overview

We propose I2VControl-Camera, a novel camera control method for image-to-video generation, offering high control precision and adjustable motion strength.

🔥 Gallery

Before the method and analysis, let’s first look at some visual results! For each sample, we manually set the camera movement and adjust it to a suitable motion strength value. The first column is the original input image, the second column is the camera motion trajectory, and the third column is the generated result.

Input	Movement	Result

💡 Method and Analysis

Users can control the camera movement with high precision by converting the camera movement into point trajectories and then executing the control process. We lift the input image from 2D to 3D as a RGBD point cloud. When the camera moves, the 3D points can be considered as moving in the camera coordinate system. Then we project them onto 2D according to current camera pose to obtain the 2D point trajectory.

Moreover, we apply a scalar value to control the motion strength of the subjects in the video, which is decoupled from the camera movement.

The follwing figure shows two samples: the top one demonstrates a pan-left camera movement, while the bottom one shows the camera sliding to the right. For each sample, we show a preview (directly render the RGBD point cloud on to 2D plane according to the extrinsic matrix) and our generated result. We can see that the generated result can almost follow the control signal at the pixel level (can be seen in the green boxes) even when there exists movable object (the cat in the red box).

We test the same camera control signal with different motion strength value. When the motion strength is set as 0, the entire scene is nearly static even when there are movable objects in the figure (polar bear, astronaut, wolf); when the motion strength is large, the main objects moves obviously.

🔥 Pixel-level Control & Visual Comparisons

We show our camera control results with ground truth preview here, which demonstrates our pixel-level control capabilities. We also list the results of the comparing methods for the qualitative comparison. We can observe that our control precision is significantly higher than that of comparative methods.

Input & GT Preview	CameraCtrl	MotionCtrl	Ours

🔥 Combinations of multiple camera movements

The following samples contain combinations of multiple camera movements.

	Input & GT Preview	CameraCtrl	MotionCtrl	Ours
move left + pan right
rotate + move up + tilt down
rotate + zoom in

🔥 Multiple dynamic objects

The following samples contain multiple dynamic objects, where our method can still achieve precise control and natural dynamics.

Input & GT Preview	CameraCtrl	MotionCtrl	Ours

🔥 Multiple motion strength

We show the results under different motion strength. It is evident that as the motion strength increases, the amplitude of the motions enlarged and shows a direct positive correlation with the set values of motion strength.

Input & GT Preview	MS=0	MS=200	MS=600

🔥 Experiment on DiT base model (Seaweed)

We present some results on another base model, Seaweed, where the results demonstrates the applicability of our method to any base model.

Pan	Zoom	Tilt	Rotate

Citation

@article{i2vcontrolcamera,
            title={I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength},
            author={Feng, Wanquan and Liu, Jiawei and Tu, Pengqi and Qi, Tianhao and Sun, Mingzhen and Ma, Tianxiang and Zhao, Songtao and Zhou, Siyu and He, Qian},
            booktitle={The Tenth International Conference on Learning Representations, (ICLR)},
            year={2025}
    }

I2VControl-Camera Precise Video Camera Control with Adjustable Motion Strength