This story draft by @escholar has not been reviewed by an editor, YET.

Experiment Settings and Evaluation Details

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1 Introduction

2 Preliminaries

3 TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction and 3.1 Learning Base Policies in Simulation with RL

3.2 Learning Residual Policies from Online Correction

3.3 An Integrated Deployment Framework and 3.4 Implementation Details

4 Experiments

4.1 Experiment Settings

4.2 Quantitative Comparison on Four Assembly Tasks

4.3 Effectiveness in Addressing Different Sim-to-Real Gaps (Q4)

4.4 Scalability with Human Effort (Q5) and 4.5 Intriguing Properties and Emergent Behaviors (Q6)

5 Related Work

6 Conclusion and Limitations, Acknowledgments, and References

A. Simulation Training Details

B. Real-World Learning Details

C. Experiment Settings and Evaluation Details

D. Additional Experiment Results

C Experiment Settings and Evaluation Details

In this section, we provide details about our experiment settings and evaluation protocols.

C.1 Main Experiments (Sec. 4.2)

We evaluate all methods on four tasks for 20 trials. Each trail starts with different objects and robot poses. We make our best efforts to ensure the same initial settings when evaluating different methods. Specifically, we take pictures for these 20 different initial configurations and refer to them when resetting a new trial. See Figs. A.14, A.15, A.16, A.17 for initial configurations of tasks Stabilize, Reach and Grasp, Insert, and Screw, respectively.

C.2 Experiments with Different Sim-to-Real Gaps (Sec. 4.3)

C.2.1 Experiment Setup


We elaborate on how different sim-to-real gaps are created.


Perception Error This is done by applying random jitter to 25% points from point clouds, which corresponds to adding noise in observation space O. We test this sim-to-real gap on the task Reach and Grasp. As visualized in Fig. A.9, with probability P = 0.6, we apply random jitter to 25% points from the point-cloud observation. The jittering noise is sampled independently from the distribution N (0, 0.03). We clip the noise to be within the ± 0.03 range.



Table A.X: Hyperparameters used in residual policy training.


Figure A.9: Visualization of introduced perception error. a) The original point-cloud observation. b) The erroneous point-cloud observation with random jitter.


Embodiment Mismatch This is done by changing the robot gripper to be shorter length as demonstrated in Fig. A.11, which corresponds to discrepancy in state space S and transition function T . We test this gap on the task Screw. We notice that the 9 cm length difference incurs a significant gap.


Dynamics Difference This is done by changing object surfaces and increasing friction, which corresponds to different transition function T . We test this gap on the task Stabilize. Concretely, we


Figure A.10: Visualization of the trajectory realized by an underactuated controller. The plot displays the end-effector’s position in the XY plane. It shows a reference circular movement, a trajectory tracked by the normal controller, and a trajectory tracked by the underactuated controller.


Figure A.11: Two different gripper fingers used to create embodiment mismatch. Policies are trained with the longer finger and tested on the shorter finger.


Figure A.12: Two square tabletops used to create dynamics difference. a) The original surface is smooth. b) We attach friction tapes to change the dynamics.


attach friction tapes to the square tabletop’s surface to increase friction, hence change the dynamics (Fig. A.12).


Object Assert Mismatch As shown in Fig. A.13, this is done by replacing the table leg with a light bulb, which corresponds to change in emitting function Ω. We test this gap on the task Reach and Grasp.


Figure A.13: Two objects used to create asset mismatch. a) Policies are trained with the table leg. b) We test policies with an unseen light bulb.


Table A.XI: Quantitative results for scalability with human correction dataset size on four tasks.


C.2.2 Evaluation


We conduct 20 trails with different initial configurations. Initial conditions for first four experiments are the same as main experiments (Figs. A.14, A.15, A.16, A.17). Fig. A.18 shows initial configurations for the experiment Object Asset Mismatch.

C.3 Data Scalability Experiments (Sec. 4.4)

In Table A.XI, we show quantitative results for scalability with human correction dataset size on four tasks.

C.4 Policy Robustness Experiments (Sec. 4.5)

C.4.1 Removing Cameras


We remove two cameras and only keep three. Note that this is the same number of cameras as in FurnitureBench [90]. For tasks other than Insert, we keep the wrist camera, the right front camera, and the left rear camera. For the task Insert, we keep two front cameras and the left rear camera.


C.4.2 Suboptimal Correction Data



Authors:

(1) Yunfan Jiang, Department of Computer Science;

(2) Chen Wang, Department of Computer Science;

(3) Ruohan Zhang, Department of Computer Science and Institute for Human-Centered AI (HAI);

(4) Jiajun Wu, Department of Computer Science and Institute for Human-Centered AI (HAI);

(5) Li Fei-Fei, Department of Computer Science and Institute for Human-Centered AI (HAI).


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks