Human Action Recognition Using 3D Reconstruction Data

Authors	G. Papadopoulos
	P. Daras
Year	2017
Venue	IEEE Transactions on Circuits and Systems for Video Technology (2018), 28(8), 1807-1823.
Download

Abstract

In this paper, the problem of human action recognition using 3D reconstruction data is deeply investigated. 3D reconstruction techniques are employed for addressing two of the most challenging issues related to human action recognition in the general case, namely view-variance (i.e. when the same action is observed from different viewpoints) and the presence of (self-) occlusions (i.e. when for a given point of view a bodypart of an individual conceals an other body-part of the same or an other subject). The main contributions of this work are summarized as follows: i) Detailed examination of the use of 3D reconstruction data for performing human action recognition. The latter includes: a) the introduction of appropriate local/global, flow/shape descriptors, and b) extensive experiments in challenging publicly available datasets and exhaustive comparisons with state-of-art approaches. ii) A new local-level 3D flow descriptor, which incorporates spatial and surface information in the flow representation and efficiently handles the problem of defining 3D orientation at every local neighborhood. iii) A new global-level 3D flow descriptor that efficiently encodes the global motion characteristics in a compact way. iv) A novel global temporal-shape descriptor that extends the notion of 3D shape descriptions for action recognition, by incorporating the temporal dimension. The proposed descriptor efficiently addresses the inherent problems of temporal alignment and compact representation, while also being robust in the presence of noise (compared with similar tracking-based methods of the literature). Overall, this work significantly improves the state-of art performance and introduces new research directions in the field of 3D action recognition, following the recent development and wide-spread use of portable, affordable, high-quality and accurate motion capturing devices (e.g. Microsoft Kinect).