ReFit: Recurrent Fitting Network for 3D Human Recovery

Yufu Wang Kostas Daniilidis

ICCV 2023

We present a neural network ReFit that iteratively estimates the 3D human body from a given image. Here we show reconstruction from 3 movie scenes. ReFit performs well under challenging poses, lighting, motion blur and occlusion. Video for academic research only.

[Paper] [Supplementary] [Code] [Bibtex]

Abstract

We present Recurrent Fitting (ReFit), a neural network architecture for single-image, parametric 3D human reconstruction. ReFit learns a feedback-update loop that mirrors the strategy of solving an inverse problem through optimization. At each iterative step, it reprojects keypoints from the human model to feature maps to query feedback, and uses a recurrent updater to adjust the model to fit the image better. Because ReFit encodes strong knowledge of the inverse problem, it is faster to train than previous regression models. At the same time, ReFit improves state-of-the-art performance on standard benchmarks. Moreover, ReFit applies to other optimization settings, such as multi-view fitting and single-view shape fitting. Training and inference code will be available.

Approach

ReFit extracts one feature map per keypoint with a backbone network (Sec. 3.1). It then reprojects keypoints from the 3D human mesh to the corresponding feature maps using the full-frame adjusted camera model (Sec. 3.2). Feedback is dropped randomly during training, and concatenated with the current estimate Θt and the bounding box info to form the final feature vector. The final feature is sent to N parallel GRUs to predict updates for the N parameters (Sec. 3.3). The updated mesh is again reprojected to the feature maps to repeat the feedback-update loop until good reconstruction is achieved.

Applications

Single view reconstruction.

Shape registration with ReFit.
We can use ReFit to register a pre-fitted or pre-scanned shape to images of the same subject. At each iteration, we render SMPL with the ground truth shape; ReFit then reprojects and adjusts the pose of the model to fit the image.

Multi-view reconstruction with ReFit.
We can use ReFit for multi-view model fitting. ReFit firstly produces updates for each view seperately. We then average them to get a single update direction for the model. Natually this update step explains evidence from multiple views. This process is repeated until the model is well aligned with the images.