Fast deformable model-based human performance capture and FVV using consumer-grade RGB-D sensors

Authors
D. Alexiadis
N. Zioulis
D. Zarpalas
P. Daras
Year
2018
Venue
Pattern Recognition (2018), 79, 260-278.
Download

Abstract

In this paper, a novel end-to-end system for the fast reconstruction of human actor performances into 3D mesh sequences is proposed, using the input from a small set of consumer-grade RGB-Depth sensors. The proposed framework, by offine pre-reconstructing and employing a deformable actor's 3D model to constrain the on-line reconstruction process, implicitly tracks the human motion. Handling non-rigid deformation of the 3D surface and applying appropriate texture mapping, it finally produces a dynamic sequence of temporally-coherent textured meshes, enabling realistic Free Viewpoint Video (FVV). Given the noisy input from a small set of low-cost sensors, the focus is on the fast (quick-post"), robust and fully-automatic performance reconstruction. Apart from integrating existing ideas into a complete end-to-end system, which is itself a challenging task, several novel technical advances contribute to the speed, robustness and fidelity of the system, including a layered approach for model-based pose tracking, the definition and use of sophisticated energy functions, parallelizable on the GPU, as well as new texture mapping scheme. The experimental results on a large number of challenging sequences, and comparisons with model-based and model-free approaches, demonstrate the efficiency of the proposed approach.