Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion

Piotr Koczy, Michael C. Welle, Danica Kragic

We present a framework for learning dexterous in-hand manipulation with multifingered hands using visuomotor diffusion policies. Our system enables complex in-hand manipulation tasks, such as unscrewing a bottle lid with one hand, by leveraging a fast and responsive teleoperation setup for the four-fingered Allegro Hand. We collect high-quality expert demonstrations using an augmented reality (AR) interface that tracks hand movements and applies inverse kinematics and motion retargeting for precise control. The AR headset provides real-time visualization, while gesture controls streamline teleoperation. To enhance policy learning, we introduce a novel demonstration outlier removal approach based on HDBSCAN clustering and the Global-Local Outlier Score from Hierarchies (GLOSH) algorithm, effectively filtering out low-quality demonstrations that could degrade performance. We evaluate our approach extensively in real-world settings and provide all experimental videos on the project website.

Expert Demonstrations

Lid unscrewing session using teleoperation system

Experiments

Best policy evaluation (\(\pi_{nt}\))

All experiments:

Contact

  • Michael C. Welle; mwelle(at)kth.se; KTH Royal Institute of Technology, Sweden