Table of Links
3.2 Learning Residual Policies from Online Correction
3.3 An Integrated Deployment Framework and 3.4 Implementation Details
4.2 Quantitative Comparison on Four Assembly Tasks
4.3 Effectiveness in Addressing Different Sim-to-Real Gaps (Q4)
4.4 Scalability with Human Effort (Q5) and 4.5 Intriguing Properties and Emergent Behaviors (Q6)
6 Conclusion and Limitations, Acknowledgments, and References
A. Simulation Training Details
B. Real-World Learning Details
C. Experiment Settings and Evaluation Details
D. Additional Experiment Results
C Experiment Settings and Evaluation Details
In this section, we provide details about our experiment settings and evaluation protocols.
C.1 Main Experiments (Sec. 4.2)
We evaluate all methods on four tasks for 20 trials. Each trail starts with different objects and robot poses. We make our best efforts to ensure the same initial settings when evaluating different methods. Specifically, we take pictures for these 20 different initial configurations and refer to them when resetting a new trial. See Figs. A.14, A.15, A.16, A.17 for initial configurations of tasks Stabilize, Reach and Grasp, Insert, and Screw, respectively.
C.2 Experiments with Different Sim-to-Real Gaps (Sec. 4.3)
C.2.1 Experiment Setup
We elaborate on how different sim-to-real gaps are created.
Perception Error This is done by applying random jitter to 25% points from point clouds, which corresponds to adding noise in observation space O. We test this sim-to-real gap on the task Reach and Grasp. As visualized in Fig. A.9, with probability P = 0.6, we apply random jitter to 25% points from the point-cloud observation. The jittering noise is sampled independently from the distribution N (0, 0.03). We clip the noise to be within the ± 0.03 range.
Embodiment Mismatch This is done by changing the robot gripper to be shorter length as demonstrated in Fig. A.11, which corresponds to discrepancy in state space S and transition function T . We test this gap on the task Screw. We notice that the 9 cm length difference incurs a significant gap.
Dynamics Difference This is done by changing object surfaces and increasing friction, which corresponds to different transition function T . We test this gap on the task Stabilize. Concretely, we
attach friction tapes to the square tabletop’s surface to increase friction, hence change the dynamics (Fig. A.12).
Object Assert Mismatch As shown in Fig. A.13, this is done by replacing the table leg with a light bulb, which corresponds to change in emitting function Ω. We test this gap on the task Reach and Grasp.
C.2.2 Evaluation
We conduct 20 trails with different initial configurations. Initial conditions for first four experiments are the same as main experiments (Figs. A.14, A.15, A.16, A.17). Fig. A.18 shows initial configurations for the experiment Object Asset Mismatch.
C.3 Data Scalability Experiments (Sec. 4.4)
In Table A.XI, we show quantitative results for scalability with human correction dataset size on four tasks.
C.4 Policy Robustness Experiments (Sec. 4.5)
C.4.1 Removing Cameras
We remove two cameras and only keep three. Note that this is the same number of cameras as in FurnitureBench [90]. For tasks other than Insert, we keep the wrist camera, the right front camera, and the left rear camera. For the task Insert, we keep two front cameras and the left rear camera.
C.4.2 Suboptimal Correction Data
Authors:
(1) Yunfan Jiang, Department of Computer Science;
(2) Chen Wang, Department of Computer Science;
(3) Ruohan Zhang, Department of Computer Science and Institute for Human-Centered AI (HAI);
(4) Jiajun Wu, Department of Computer Science and Institute for Human-Centered AI (HAI);
(5) Li Fei-Fei, Department of Computer Science and Institute for Human-Centered AI (HAI).
This paper is