Table of Links
3.2 Learning Residual Policies from Online Correction
3.3 An Integrated Deployment Framework and 3.4 Implementation Details
4.2 Quantitative Comparison on Four Assembly Tasks
4.3 Effectiveness in Addressing Different Sim-to-Real Gaps (Q4)
4.4 Scalability with Human Effort (Q5) and 4.5 Intriguing Properties and Emergent Behaviors (Q6)
6 Conclusion and Limitations, Acknowledgments, and References
A. Simulation Training Details
B. Real-World Learning Details
C. Experiment Settings and Evaluation Details
D. Additional Experiment Results
4.3 Effectiveness in Addressing Different Sim-to-Real Gaps (Q4)
While TRANSIC is a holistic approach to address multiple sim-to-real gaps simultaneously, we shed light on its ability to close each individual gap. To do so, we create five different simulation-reality pairs. For each of them, we intentionally create large gaps between the simulation and the real world. These gaps are applied to the real-world setting and they include perception error, underactuated controller, embodiment mismatch, dynamics difference, and object asset mismatch. Note that these are artificial settings for a controlled study. See the Appendix Sec. C.2 for detailed setups.
As shown in Fig. 5, TRANSIC achieves an average success rate of 77% across five different simulation-reality pairs with deliberately exacerbated sim-to-real gaps. This indicates its remarkable ability to close these individual gaps. In contrast, the best baseline method, IWR, only achieves an average success rate of 18%. We attribute this effectiveness in addressing different sim-to-real gaps to the residual policy design. Zeng et al. [83] echos our finding that residual learning is an effective tool to compensate for domain factors that cannot be explicitly modeled. Furthermore, training with data specifically collected from a particular setting generally increases TRANSIC’s performance. However, this is not the case for IWR, where fine-tuning on new data can even lead to worse performance. These results show that TRANSIC is better not only in addressing multiple sim-to-real gaps as a whole, but also in handling individual types of gaps of very different nature.
Authors:
(1) Yunfan Jiang, Department of Computer Science;
(2) Chen Wang, Department of Computer Science;
(3) Ruohan Zhang, Department of Computer Science and Institute for Human-Centered AI (HAI);
(4) Jiajun Wu, Department of Computer Science and Institute for Human-Centered AI (HAI);
(5) Li Fei-Fei, Department of Computer Science and Institute for Human-Centered AI (HAI).
This paper is