This story draft by @escholar has not been reviewed by an editor, YET.

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction: Experiment Settings

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1 Introduction

2 Preliminaries

3 TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction and 3.1 Learning Base Policies in Simulation with RL

3.2 Learning Residual Policies from Online Correction

3.3 An Integrated Deployment Framework and 3.4 Implementation Details

4 Experiments

4.1 Experiment Settings

4.2 Quantitative Comparison on Four Assembly Tasks

4.3 Effectiveness in Addressing Different Sim-to-Real Gaps (Q4)

4.4 Scalability with Human Effort (Q5) and 4.5 Intriguing Properties and Emergent Behaviors (Q6)

5 Related Work

6 Conclusion and Limitations, Acknowledgments, and References

A. Simulation Training Details

B. Real-World Learning Details

C. Experiment Settings and Evaluation Details

D. Additional Experiment Results

4.1 Experiment Settings

Tasks We consider complex contact-rich manipulation tasks that require high precision in FurnitureBench [90]. These tasks are challenging and ideal for testing sim-to-real transfer, since perception, embodiment, controller, and dynamics gaps all need to be addressed to accomplish the tasks successfully. Specifically, we divide the assembly of a square table into four independent tasks (Fig. 3): Stabilize, Reach and Grasp, Insert, and Screw. We collect 20, 100, 90, and 17 real-robot trajectories with human correction, respectively. To further test the generalization to unseen objects from a new category, we experiment with a lamp (Fig. 6b). All experiments are conducted on a tabletop setting with a mounted Franka Emika 3 robot. See the Appendix Sec. B.1 for the detailed system setup.


Baselines and Evaluation Protocol We compare with the following three groups of baselines. 1) Traditional sim-to-real methods: This group includes direct deployment of simulation policy trained with domain randomization and data augmentation [53], denoted as “DR. & Data Aug.”. It also covers the real-world fine-tuning paradigm, where simulation policies are further fine-tuned with real-robot data through BC (denoted as “BC Fine-Tune”) and the state-of-the-art offline RL method (implicit Q-learning [69], denoted as “IQL Fine-Tune”). To estimate the performance lower bound, we also include the baseline without any data augmentation or real-world fine-tuning, denoted as “Direct Transfer”. 2) Interactive IL: This group represents the state-of-the-art interactive imitation learning methods, including HG-Dagger [66] and IWR [67]. 3) Learning from real-robot data only: This group includes BC [72], BC-RNN [68], and IQL [69]. They are trained on real-robot demonstrations only. We follow Liu et al. [70] to label reward for IQL. All evaluations consist of 20 trials starting with different objects and robot poses. We make our best efforts to ensure the same initial settings when evaluating different methods. See Appendix Sec. C for the detailed evaluation protocol.


Figure 4: Average success rates over four benchmarked tasks. TRANSIC significantly outperforms three baseline groups. Results are success rates averaged over four tasks.


Table 1: Success rates per tasks. TRANSIC outperforms all baseline methods on all four tasks.


Authors:

(1) Yunfan Jiang, Department of Computer Science;

(2) Chen Wang, Department of Computer Science;

(3) Ruohan Zhang, Department of Computer Science and Institute for Human-Centered AI (HAI);

(4) Jiajun Wu, Department of Computer Science and Institute for Human-Centered AI (HAI);

(5) Li Fei-Fei, Department of Computer Science and Institute for Human-Centered AI (HAI).


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks