Generalising Sparse Pose Estimation using Masked Autoencoders
This project aims to make full-body human pose and motion estimation reliable in real-world settings with sparse, incomplete, and changing observations, improving robustness to dropout, occlusion, and variable sensor configurations.
Overview
This project aims to make full-body human pose and motion estimation reliable in real-world settings where only sparse, incomplete, and changing observations are available. Instead of assuming a fixed capture setup, we target motion reconstruction that remains usable when sensors are limited, joints are intermittently missing due to occlusion or tracking loss, and input configurations vary across devices and users.
Vision
Full-body motion understanding should work outside of controlled labs. Our vision is to enable robust, general-purpose pose estimation that supports realistic deployment constraints: few sensors, variable tracking quality, and heterogeneous hardware setups. We want systems that "just work" across many configurations and degrade gracefully when observations drop out.
Core Research Questions
- What does it take to build pose estimation systems that are robust to sparse and variable inputs in the wild?
- How can we reduce the burden of engineering and maintaining solutions across many sensor layouts and tracking conditions?
- What evaluation practices best reflect real deployment realities like occlusion, dropout, and configuration changes?
- How can we ensure the resulting systems are practical for downstream products that need stability and reliability?
Applications
VR/AR full-body avatars from limited tracking
Enable stable full-body animation and embodiment experiences when only a few devices are tracked and when tracking can drop intermittently.
Wearable-based motion inference
Support motion reconstruction from heterogeneous wearable configurations (e.g., phone/watch/headset) where the available signals differ across users and situations.
Motion understanding for robotics and HRI
Provide reliable pose estimates for interaction, safety, or imitation scenarios where sensing is partial or occluded.
Resource-constrained motion streaming
Make it feasible to work with sparse motion signals in settings where bandwidth, compute, or sensor coverage is limited.
Evaluation & Impact
Reliability under real conditions
The primary impact is improving robustness when reality deviates from ideal assumptions:
- occlusions and out-of-view motion,
- tracking loss and intermittent availability,
- differences in sensor configurations across users and devices.
Lower deployment and maintenance cost
By targeting solutions that generalize across variability, this project aims to reduce the need for bespoke pipelines tuned to specific hardware configurations or controlled environments.
Broader access to full-body motion capabilities
More reliable sparse-input pose estimation can expand where full-body motion is feasible—unlocking new interaction and product experiences without specialized capture rigs.
Future Directions
- Establish stronger benchmarks and test suites that reflect deployment realities (dropout, occlusion, changing inputs).
- Improve robustness to real sensor artifacts like noise and drift.
- Explore personalization and adaptation strategies that maintain broad generalization.
- Extend to additional downstream tasks that benefit from reliable motion understanding.

