OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Rebuttal: OmniH2O dynamic tracking without sim2real regularization

Rebuttal: OmniH2O vs ExBody

Rebuttal: Comparison betweeen OmniH2O and H2O

Rebuttal: Comparison betweeen History Utilization

Rebuttal: Comparison betweeen Linear Velocity

Rebuttal: Comparison betweeen Motion Augmentation

Abstract

We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4. OmniH2O demonstrates versatility and dexterity in various real-world whole-body tasks through teleoperation or autonomy, such as playing multiple sports, moving and manipulating objects, and interacting with humans. We develop an RL-based sim-to-real pipeline, which involves large-scale retargeting and augmentation of human motion datasets, learning a real-world deployable policy with sparse sensor input by imitating a privileged teacher policy, and reward designs to enhance robustness and stability. We release the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets.

Dexterous Human-to-Humanoid Whole-Body Teleoperation

Verbal Instructions (MDM)

Autonomous Agent (GPT-4o)

Autonomous Agent (Diffusion Policy Learned from Teleoperation Demonstrations)

Robustness Test (the same motion tracking policy)

Outdoor Locomotion (the same motion tracking policy)

Method

OmniH2O

OmniH2O retargets large-scale human motions and filters out infeasible motions for humanoids.
Our sim-to-real policy is distilled through supervised learning from an RL-trained privileged policy using privileged information.
The universal design of OmniH2O supports versatile human control interfaces including VR headset and RGB camera etc. The sim-to-real policy also supports being controlled by autonomous agents like GPT-4 or Diffusion policy trained by teleoperation dataset to generate motion goals.