HUMAN4D: A Human-centric Multimodal Dataset for Motions & Immersive Media

* At this moment, the paper of this dataset is under review. The dataset is going to be published along with the publication of the paper.

HUMAN4D constitutes a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system.

Pictures taken during the preparation and capturing of the HUMAN4D dataset.

The room was equipped with 24 Vicon MXT40S cameras rigidly placed on the walls, while a portable volumetric capturing system ( with 4 Intel RealSense D415 depth sensors was temporarily set up to capture the RGBD data cues.

HW-SYNCed multi-view RGBD samples (4 RGBD frames each) from “stretching_n_talking”(top) and “basket-ball_dribbling”(bottom) activities.


Using a custom photogrammetry rig with 96 cameras, photos were taken of the actor (left) and reconstructed into a 3D textured mesh using Agisoft Metashape (right).


Reconstructed mesh-based volumetric data with (Left) color per vertex visualization in 3 voxel-grid resolutions, i.e. r = 5, r = 6 and r = 7, and (Right) textured 3D mesh sample in voxel-grid resolution for r= 6.

Merged reconstructed point-cloud from one single mRGBD frame from various views.