Table of Links
-
Related Works
2.1. Vision-and-Language Navigation
-
3.2. Open-set Semantic Information from Images
-
Conclusion and Future Work, Disclosure statement, and References
3.1. Data Collection
Creating the O3D-SIM begins by capturing a sequence of RGB-D images using a posed camera, with an estimate of the extrinsic and intrinsic parameters of the environment to be mapped. The pose information associated with each image is used to transform the point clouds to a world coordinate frame. For simulations, we use the groundtruth pose associated with each image, whereas we leverage RTAB-Map[30] with G2O optimization [31] in the real world to generate these poses.
Authors:
(1) Laksh Nanwani, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;
(2) Kumaraditya Gupta, International Institute of Information Technology, Hyderabad, India;
(3) Aditya Mathur, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;
(4) Swayam Agrawal, International Institute of Information Technology, Hyderabad, India;
(5) A.H. Abdul Hafez, Hasan Kalyoncu University, Sahinbey, Gaziantep, Turkey;
(6) K. Madhava Krishna, International Institute of Information Technology, Hyderabad, India.
This paper is
