[ECCV 2022] Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection

While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes. We present Faster VoxelPose to address the challenge by re-projecting the feature volume to the three two-dimensional coordinate planes and estimating X, Y, Z coordinates from them separately. To that end, we first localize each person by a 3D bounding box by estimating a 2D box and its height based on the volume features projected to the xy-plane and z-axis, respectively. Then for each person, we estimate partial joint coordinates from the three coordinate planes separately which are then fused to obtain the final 3D pose. The method is free from costly 3D-CNNs and improves the speed of VoxelPose by ten times and meanwhile achieves competitive accuracy as the state-of-the-art methods, proving its potential in real-time applications.






European Conference on Computer Vision (ECCV) is the top European conference in the image analysis area. ECCV, along with CVPR and ICCV, are regarded as the top conferences in the field of computer vision. ECCV is held biennially. Due to concerns about COVID-19, ECCV 2022 will be hosted online from October 23rd to October 27th, 2022.