We propose RMA-Net for non-rigid registration. With a recurrent unit, the network iteratively deforms the input surface shape stage by stage until converging to the target. RMA-Net is totally trained in an unsupervised manner by aligning the source and target shapes via our proposed multi-view 2D projection loss.
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data. In this paper, we resolve these two challenges simultaneously. First, we propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations. This representation not only makes the solution space well-constrained but also enables our method to be solved iteratively with a recurrent framework, which greatly reduces the difficulty of learning. Second, we introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images so that our full framework can be trained end-to-end without ground truth supervision. Extensive experiments on several different datasets demonstrate that our proposed method outperforms the previous state-of-the-art by a large margin.
We represent the non-rigid deformation as the point-wise combination of rigid transformations, which are predicted in iterative recurrent stages. We employ a GRU-based framework to predict the rigid transformation () and point-wise sknning weights () in each stage, whose architecture is shown in the above figure.
We render the surface to multi-view 2D mask and depth images, and compute the loss on the multi-view 2D images. Illustration of the differentiable rendering process from the 3D point cloud to the 2D depths and masks is shown above. (a): the input point cloud. (b): Given the point cloud and camera, we project all the points to the front-view and set the value of the depth map as the z-value of the projected points. (c): We remove the invisible points around pixel based on the depth values of points projected in the -centered window. (d): The depth value of is computed by a weighted average of the z-value of visible points projected in the window, and the mask of the object can also be recovered accordingly.
The above figure shows the iterative results, where the first and last columns show the source and target surfaces, and the 2~8-th columns show how the deformable shapes change to become closer and closer to the target during iterative stages. The video shows the animation of the dynamic deformation, which provides a more vivid display of the registration process.
The comparions between the results of our method and the results of other previous methods, including CPD, BCPD, CPD-Net and PR-Net.
@inproceedings{feng2021recurrent,
author = {Wanquan Feng and Juyong Zhang and Hongrui Cai and Haofei Xu and Junhui Hou and Hujun Bao},
title = {Recurrent Multi-view Alignment Network for Unsupervised Surface Registration},
booktitle = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2021}
}