MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning

Human Aligned

Embodiment Agnostic

3rd-Person View

1st-Person View

Processed View

MV-UMI turns human-aligned data to cross-embodiment data for robot policy learning.

Cross Embodiment Illustration

Cross-Embodiment Deployment

What if we naively trained a policy on the human-aligned unsegmented data? ☹️

In our ablation study, we found that the policy trained on human-aligned data failed to complete any tasks, despite often moving in sensible directions. This failure is likely due to a distribution shift in the observation space between training and deployment.

Interestingly, the policy trained on segmented data performed well even when the robot was unsegmented at inference. We hypothesize that removing the human helped the model avoid overfitting to strong correlations between human and gripper actions, and focusing on taking the learning signal from the task-relevant parts of the scene.

This is also evident from the attention maps coming from our vision encoders:
Attention 1 Attention 2 Attention 3 Attention 4
a) Unsegmented model focusing on the embodiment
Attention 5 Attention 6 Attention 7 Attention 8
b) MV-UMI model focusing on the objects manipulated
Attention 9 Attention 10 Attention 11 Attention 12
c) MV-UMI model focusing on the objects manipulated (even when robot is unsegmented!)

Why not stick to the egocentric camera alone?

Unable to find the cup
Unable to find the empty bottles rack

MV-UMI Pipeline

Nerfies teaser image

Results

Bottles-Rack-Placer
Cans-Shelf-Placer
Markers-Placements
Task Bottles
Task Cans
Task Markers
Results Bottles
Results Bottles
Ablation Study

Mechanical Design

The design of MV-UMI is intended to be modular. Mounts are designed for the gripper, allowing it to be used handheld or when attached to a robot.

Hardware Repository

(coming soon!)

This includes the CAD models necessary to print the handheld and robot-operated configurations of our three-jaw gripper.

Handheld Configuration
Motorized Configuration

We also include a moodified version of the two-gripper UMI, which allows it to be controlled with affordable linear motors.

Modified UMI with Linear Motor Control