GenAug

Abstract

Robot learning methods have the potential for widespread generalization across tasks, environments, and objects. However, these methods require large diverse datasets that are expensive to collect in real-world robotics settings. For robot learning to generalize, we must be able to leverage sources of data or priors beyond the robot’s own experience. In this work, we posit that image-text generative models, which are pre-trained on large corpora of web-scraped data, can serve as such a data source. We show that despite these generative models being trained on largely non-robotics data, they can serve as effective ways to impart priors into the process of robot learning in a way that enables widespread generalization. In particular, we show how pre-trained generative models can serve as effective tools for semantically meaningful data augmentation. By leveraging these pre-trained models for generating appropriate “semantic” data augmentations, we propose a system GenAug that is able to significantly improve policy generalization. We apply GenAug to tabletop manipulation tasks, showing the ability to re-target behavior to novel scenarios, while only requiring marginal amounts of real-world data. We demonstrate the efficacy of this system on a number of object manipulation problems in the real world, showing a 40% improvement in generalization to novel scenes and objects.

@article{chen2023genaug, title={GenAug: Retargeting behaviors to unseen situations via Generative Augmentation}, author={Chen, Zoey and Kiami, Sho and Gupta, Abhishek and Kumar, Vikash}, journal={arXiv preprint arXiv:2302.06671}, year={2023} }

GenAug: Retargeting behaviors to unseen situations via Generative Augmentation

We show that GenAug policies can achieve widespread real-world generalization for tabletop manipulation, even when they are only provided with a few demonstrations in a simple training environment.

GenAug: Generative Augmentation for Real-World Data Collection

Abstract

Video

Real-World Experiments

By training on a dataset extrapolated from only 10 demonstrations in this simple environment on the left, the robot is able
to solve the task in entirely different environments and objects.

Simulation Experiments

To further study in depth the effectiveness of GenAug, we conduct large-scale experiments with other baselines in simulation.
In particular, we organize baseline methods as (1) in-domain augmentation methods and (2) learning from out-of-domain priors.

Table-top Manipulation Tasks

Behavior Cloning Tasks

Approach

1. GenAug:

Given a demonstration on one simple environment, GenAug can automatically add distractor objects,
change the object texture, change object classes and change table and background.

2. Data Generation
We randomly select the mode and augment the initial dataset into a large and diverse dataset.

3. System
We generate a large and diverse augmented dataset from a small amount of human demonstrations collected in a simple environment, and use this augmented dataset to train a language-conditioned robot policy and deploy it in the real-world.

4. Data Collection
To collect demonstrations in the real-world, a user generates a small dataset by specifying pick and place locations. These 2D locations are projected to the 3D points in the robot coordinates using calibrated depth maps.

Concurrent Work

BibTeX

GenAug: Retargeting behaviors to unseen situations via Generative Augmentation

We show that GenAug policies can achieve widespread real-world generalization for tabletop manipulation, even when they are only provided with a few demonstrations in a simple training environment.

GenAug: Generative Augmentation for Real-World Data Collection

Abstract

Video

Real-World Experiments

By training on a dataset extrapolated from only 10 demonstrations in this simple environment on the left, the robot is able to solve the task in entirely different environments and objects.

Simulation Experiments

To further study in depth the effectiveness of GenAug, we conduct large-scale experiments with other baselines in simulation. In particular, we organize baseline methods as (1) in-domain augmentation methods and (2) learning from out-of-domain priors.

Table-top Manipulation Tasks

Behavior Cloning Tasks

Approach

1. GenAug: Given a demonstration on one simple environment, GenAug can automatically add distractor objects, change the object texture, change object classes and change table and background.

2. Data Generation We randomly select the mode and augment the initial dataset into a large and diverse dataset.

3. System We generate a large and diverse augmented dataset from a small amount of human demonstrations collected in a simple environment, and use this augmented dataset to train a language-conditioned robot policy and deploy it in the real-world.

4. Data Collection To collect demonstrations in the real-world, a user generates a small dataset by specifying pick and place locations. These 2D locations are projected to the 3D points in the robot coordinates using calibrated depth maps.

Concurrent Work

BibTeX

By training on a dataset extrapolated from only 10 demonstrations in this simple environment on the left, the robot is able
to solve the task in entirely different environments and objects.

To further study in depth the effectiveness of GenAug, we conduct large-scale experiments with other baselines in simulation.
In particular, we organize baseline methods as (1) in-domain augmentation methods and (2) learning from out-of-domain priors.

1. GenAug:

Given a demonstration on one simple environment, GenAug can automatically add distractor objects,
change the object texture, change object classes and change table and background.

2. Data Generation
We randomly select the mode and augment the initial dataset into a large and diverse dataset.

3. System
We generate a large and diverse augmented dataset from a small amount of human demonstrations collected in a simple environment, and use this augmented dataset to train a language-conditioned robot policy and deploy it in the real-world.

4. Data Collection
To collect demonstrations in the real-world, a user generates a small dataset by specifying pick and place locations. These 2D locations are projected to the 3D points in the robot coordinates using calibrated depth maps.