Real-time 3D Motion Capturing based on volumetric data and body-worn inertial sensors



In this page, you can find information and supplementary videos for the paper entitled “Real-time 3D Motion Capturing based on volumetric data and body-worn inertial sensors”


The dataset used in the experiments, along with the necessary documents, can be found here.  The volume data or the Kinect RGB-D, due to their size, can be given after private contact.



In this paper, a novel framework for human motion capturing is presented, which utilizes volumetric and joint rotational information extracted from multi-view depth streams and “inertial” data, respectively. Fusing this information, by applying kinematics and skeleton fitting techniques, the proposed framework is used to estimate the human pose over time. More specifically, graph-based spatio-temporal information extraction is employed to estimate the articulated body structure and extract its extreme joint positions, while joint rotational information is calculated by applying optimized gradient descent algorithms on inertial data. Furthermore, an articulated humanoid biped model is constructed based on the human body structure and its constraints, which is then fitted into the volumetric data. To fit the biped into the volume, the biped model is initially transformed by using the extracted spatial information, while, an articulated Iterative Closest Point algorithm, based on a solution to the Inverse Kinematic problem (ICP-IK), is introduced to spatially align the biped model and the volume. The proposed fusion method results in high precision motion capturing, overcoming the quality variability of the incoming information over time. Moreover, experiments with varying number of inertial sensors are presented to investigate the effectiveness of the method in relation with the equipment complexity. Experimental comparison against VICON verifies the effectiveness of the proposed framework.



Experimental Results

Kinect Skeleton tracking (red) cannot function properly once self-occlusion is present. In A50, the left arm as well as the right upper arm of the subject are occluded, thus, the estimates are erroneous. In P29, the left arm is occluded, thus, the tracking confuses the ball with the left wrist. Instead, the proposed method (blue) functions appropriately.


Green, White, Cyan and Blue colours indicate the results of SfV, the proposed method with 1 WIMU (C0), with 5 WIMUs (C1) and with 9 WIMUs (C2), respectively. The results indicate the effectiveness of the skeleton fitting procedure and the use of WIMUs.


Red, Green, White, Cyan and Blue colours indicate the skeletons extracted with Kinect, SfV, proposed method with 1 WIMU (C0), with 5 WIMUs (C1) and with 9 WIMUs (C2), respectively. The extracted skeletons are projected onto the RGB streams of the frontal Kinect sensor.


The Kinect Skeleton tracking confuses the racket with the athlete’s hand, while the proposed method succeeds in tracking the hand based on the determined body structure.


Supplementary Videos (Qualitative results)

Gaelic Football Punt Kick 1


Gaelic Football Punt Kick 2


Gaelic Football Punt Kick 3


Gaelic Football Fist Pass 1


Gaelic Football Fist Pass 2

Extra Videos