We present MoVNect, a lightweight deep neural network to capture 3D human pose using a single RGB camera.
To improve the overall performance of the model, we apply the teacher-student learning method based knowledge distillation to 3D human pose estimation. A realtime post-
processing makes the CNN output to yield temporally stable 3D skeletal information, which can apply to applications directly. We implement a 3D avatar application running on mobile in realtime to demonstrate that our network
achieves both high accuracy and fast inference time. Extensive evaluations show the advantages of our lightweight model with the proposed training method over previous 3D pose estimation methods on Human3.6M dataset and mobile devices.
Published on arXiv.org