Decoupling Full-Body Motion: Introducing a Stratified Approach to Solve Sparse Observation Challenge

Table of Links

Related Work

2.1. Motion Reconstruction from Sparse Input

2.2. Human Motion Generation
SAGE: Stratified Avatar Generation and 3.1. Problem Statement and Notation

3.2. Disentangled Motion Representation

3.3. Stratified Motion Diffusion

3.4. Implementation Details
Experiments and Evaluation Metrics

4.1. Dataset and Evaluation Metrics

4.2. Quantitative and Qualitative Results

4.3. Ablation Study
Conclusion and References

Supplementary Material

2.1. Motion Reconstruction from Sparse Input

The task of reconstructing full human body motion from sparse observations has gained significant attention in recent decades within the research community [1, 3, 5, 7, 10, 11, 16, 18, 19, 46, 47, 49–51, 54]. For instance, recent works [16, 19, 46, 50, 51] focus on reconstructing full body motion from six inertial measurement units (IMUs). SIP [46] employs heuristic methods, while DIP [16] pioneers the use of deep neural networks for this task. PIP [51] and TIP [19] further enhance performance by incorporating physics constraints. With the rise of VR/AR applications, researchers turn their attention toward reconstructing full body motion from VR/AR devices, such as head-mounted devices (HMDs), which only provide information about the user’s head and hands, posing additional challenges. LoBSTr [49], AvatarPoser [18], and AvatarJLM [54] approach this task as a regression problem, utilizing GRU [49] and Transformer Network [18, 54] to predict the full body pose from sparse observations of HMDs. Another line of methods employs generative models [5, 7, 10, 11]. For example, VAEHMD [10] and FLAG [5] utilize Variational AutoEncoder (VAE) [20] and Normalizing flow [35], respectively. Recent works [7, 11] leverage more powerful diffusion models [15, 38] for motion generation, yielding promising results due to the powerful ability of diffusion models in modeling the conditional probabilistic distribution of full-body motion.

Contrasting with previous methods that model full-body motion in a comprehensive, unified framework, our approach acknowledges the complexities such methods impose on deep learning models, particularly in capturing the intricate kinematics of human motion. Hence, we propose a stratified approach that decouples the conventional full-body avatar reconstruction pipeline, first for the upper body and then for the lower body under the condition of the upper-body.

Authors:

(1) Han Feng, equal contributions, ordered by alphabet from Wuhan University;

(2) Wenchao Ma, equal contributions, ordered by alphabet from Pennsylvania State University;

(3) Quankai Gao, University of Southern California;

(4) Xianwei Zheng, Wuhan University;

(5) Nan Xue, Ant Group ([email protected]);

(6) Huijuan Xu, Pennsylvania State University.

This paper is available on arxiv under CC BY 4.0 DEED license.

Decoupling Full-Body Motion: Introducing a Stratified Approach to Solve Sparse Observation Challenge

Table of Links

2. Related Work

2.1. Motion Reconstruction from Sparse Input