The CERTH-SOR3D dataset consists of over 20k instances of human-object interactions. The capturing process involved three synchronized Microsoft Kinect v2 sensors (three capturing views) under controlled environmental conditions. The available data contain 1) RGB videos (1920×1080 pixels ), 2) depth map sequences (512×424 pixels). and c) 3D optical flow fields.
More details about the dataset: http://sor3d.vcl.iti.gr/