Dark Roads, Cloudy Skies, Textureless Walls: ReLoc-PDR Still Finds the Way

Table of Links

IV. EXPERIMENTS

In this section, we first present the experimental setup, encompassing the necessary equipment and intricate details of the experiment. Subsequently, we conduct comprehensive experiments to assess the robustness and accuracy of the fusion positioning system proposed in this study. We evaluate our method across three distinct environmental conditions, i.e. a textureless corridor, overcast weather, and a dark roadway, each presenting unique visual challenges.

A. Experimental Setup

In the experiment, a Xiaomi 10 smartphone was employed for both offline map construction and online testing purposes. To reconstruct the 3D map model, video sequences of the scene were captured using the smartphone at a frequency of 30 Hz and a resolution of 1920x1080. These sequences were then downsampled to obtain discrete database images. In this study, COLMAP [34], a structure-from-motion (SFM) tool, was utilized for generating sparse SFM models. We made certain modifications to adapt it to pedestrian navigation. Specifically, we employed the NetVLAD [21] feature to retrieve the top 50 image matching pairs for each database image, which were subsequently inputted into the COLMAP pipeline to aid the image matching process. Additionally, we introduced a scale estimation module into COLMAP to convert the 3D point cloud into real-world scale, allowing for an understanding of the environment’s dimensions. This approach involved using pre-placed artificial markers [35] with known lengths to restore the 3D point cloud to a real-world scale. Finally, a new 3D SFM model was constructed using keypoints detected by SuperPoint [14], based on the hloc toolbox [8]. The resulting 3D SFM model, depicting both indoor and outdoor environments, can be observed in Fig. 4.

During the testing phase, our system records IMU data at a rate of 100Hz and captures image frames at 30Hz. However, the query image is triggered specifically on the node where gait is detected during pedestrian walking. The resolution of all query images is standardized to 600x800 pixels. To assess the localization performance and robustness of the proposed method, we conducted experiments in three distinct environments characterized by varying visual challenges. These experiments involved holding a smartphone in each of these environments.

B. Indoor Experiment in Textureless Corridor

As illustrated in Fig. 5, the initial experiment took place in an indoor corridor and office environment, known for its visually challenging characteristics such as walls with limited texture and the presence of moving pedestrians. During this experiment, the participant navigated the corridor while holding the smartphone and encountered multiple sharp turns along a pre-determined path. These sharp turns have the potential to induce significant heading drift. The experiment had a duration of approximately 355 seconds, covering a total distance of approximately 240 meters.

To highlight the advantages of our method in indoor positioning environments, we conducted a comparative analysis with three other approaches. The first method examined was a pure inertial-based pedestrian navigation (PDR) approach. The second approach involved VINS-Mono [32], which is a state-of-the-art Visual-Inertial SLAM method known for its exceptional tracking performance and competitive positioning accuracy. The third approach combined PDR with visual localization using a dynamic weighting strategy [12], [13], referred to as DW-PDR/vision.

Figure 6 illustrates the trajectory comparison among the different methods in an indoor environment. The PDR algorithm

demonstrates relatively smooth overall trajectory; however, it suffers from trajectory drift due to accumulated errors in heading estimation. The positioning accuracy of VINS-Mono is severely compromised in indoor environments due to the presence of less-textured features and the utilization of lowquality sensors in mobile devices, leading to inferior performance compared to the pure inertial-based PDR. Leveraging stronger geometric constraints from a prior 3D map and the robustness of learned features, the visual relocalization method achieves accurate positioning in most cases. By combining PDR with the visual relocalization results, the cumulative errors in PDR can be effectively corrected using the visual measurements. However, it is evident that the trajectory of DW-PDR/vision lacks robustness and smoothness, often experiencing significant discontinuities due to interference from abnormal visual relocalization observations in visually similar scenarios. In contrast, our proposed method exhibits robustness against abnormal visual relocalization observations. It dynamically assesses the reliability of visual relocalization results using the Tukey robust kernel, enabling adaptive decisionmaking regarding the reliance on either PDR or global visual observations. Additionally, our method leverages the incremental smoothing iSAM2 algorithm to provide a smoother and more continuous trajectory compared to other approaches, as depicted in the locally enlarged region in Figure 6.

Table I provides the statistical analysis of horizontal positioning errors for the different methods. As obtaining the ground truth of pedestrian trajectory is not feasible, we adopt

artificial marker points with known positions as a reference benchmark. The results presented in Table I indicate that our proposed method achieves superior positioning accuracy in complex indoor environments, effectively reducing the root mean square error (RMSE) of the pure inertial-based PDR by 96.3%. VINS-Mono exhibits the lowest accuracy due to its degraded tracking performance in indoor environments characterized by less-textured conditions. The DW-PDR/vision method experiences a maximum error of 12.4825 m, attributed to the influence of abnormal visual observations. In comparison, our method surpasses the performance of DW-PDR/vision by improving the positioning accuracy by 91.9% in terms of RMSE and reducing the maximum error to 0.4435 m. These results effectively demonstrate the robustness of our method in challenging indoor environments.

C. Outdoor Experiment in Overcast Weather Condition

To evaluate the positioning performance of our proposed method in challenging outdoor environments, a second experiment was conducted along a route encircling a hill. This test encompassed dynamic vehicle movements, variations in cloudy weather, and changes in scene structure, all of which have the potential to impede visual tracking. The experiment had a walking duration of approximately 230 seconds, covering a total path length of approximately 225 meters. As the satellite signal was obstructed by tall trees and buildings, it was not feasible to obtain a reference positioning trajectory from the RTK recorder. Instead, we employed the trajectory estimation of FAST-LIO2 [36], one of the state-ofthe-art LiDAR-inertial odometers, as a reference value. To synchronize the timestamps between the smartphone-based results and the LiDAR-based output, the volunteer held the experimental device (Fig. 7) in an upward position to stimulate the accelerometer and produce a spike before walking. By aligning the first peak of the smartphone acceleration data with the first peak of the LiDAR’s built-in IMU acceleration data, we obtained the time difference between them, achieving synchronization.

Figure 8 displays the trajectory results of various algorithms in outdoor cloudy environments. It is evident that our proposed method closely aligns with the reference trajectory and exhibits a smooth trajectory without any sudden jumps. The PDR algorithm, due to the inherent noise of inertial sensors, significantly deviates from the ground truth. While VINSMono performs reasonably well in outdoor environments, its positioning accuracy is limited by the lower quality of mobile phone built-in sensors. Compared to visual-inertial SLAM methods, the visual relocalization aided PDR methods achieve superior trajectory estimation. However, the DWPDR/vision approach based on dynamic weighting strategy, despite achieving remarkable positioning results, demonstrates noticeable trajectory jumps under abnormal visual relocalization observations, a phenomenon not observed in our method. The proposed optimization-based fusion positioning method effectively mitigates the impact of erroneous visual observations on positioning accuracy through the use of a robust kernel function, resulting in smoother and more robust positioning outcomes. Furthermore, the distribution of

horizontal positioning errors with pedestrian steps is depicted in Figure 9. These results highlight the superior performance of our method compared to other algorithms, consistently providing accurate positioning results with errors of less than 1 m. By incorporating the visual relocalization results, our method significantly reduces cumulative errors in PDR. In contrast, the positioning accuracy of the DW-PDR/vision method degrades significantly due to the interference of abnormal visual relocalization observations, underscoring the robustness of our method in challenging environments.

Table II presents the positioning error statistics for different algorithms. Our method exhibits the highest positioning accuracy and significantly reduces the cumulative errors of PDR by 86.9%. While VINS-Mono achieves impressive loop error reduction through loop-closing, its positioning accuracy does not match the competitiveness of our method. In comparison to DW-PDR/vision, our method outperforms it by demonstrating a 14.9% improvement in terms of the RMSE metric. Moreover, our method exhibits a maximum error of only 1.0604 m, whereas DW-PDR/vision shows a maximum error of 4.7975 m. These results effectively illustrate the robustness of our proposed method against abnormal disturbances.

D. Outdoor Experiment in Low-Light Condition

To further assess the robustness of our method in visually challenging environments, we conducted a third experiment under outdoor nighttime conditions. Figure 10 showcases an example of the nighttime images captured using a mobile phone. The low lighting conditions during nighttime result in poor image quality, presenting a significant challenge for visual tracking methods. Traditional handcrafted feature descriptors exhibit limited robustness in such challenging environments due to their poor invariance. In this study, we introduced learned features [14] into our image-based pipeline. These learned features generate denser and more accurate matches compared to traditional methods like SIFT, as depicted in Figure 11. This integration effectively enhances the reliability and continuity of conventional visual methods.

Figure 12 displays the trajectory results of different approaches in an outdoor nighttime environment. It is evident that our method consistently provides continuous and accurate trajectory estimations, showcasing robustness against severely degraded environments. The pure inertial-based PDR approach yields acceptable results as inertial sensors are independent of visual cues. However, its positioning error accumulates over time. In the nighttime environment, VINS-Mono’s positioning results deviate significantly from the reference trajectory due to the lack of sufficient features for accurate match tracking. Although the incorporation of learned features improves the robustness of positioning for nighttime query images to some extent, visual relocalization still encounters numerous failures due to the limited representation of local features. Figure 13 illustrates the horizontal positioning errors with significant fluctuations observed in the DW-PDR/vision method. This is attributed to relying solely on the number of inliers as a criterion for evaluating reliability, which can introduce abnormal visual results. The dynamic weighting fusion strategy fails to eliminate the impact of incorrect observations, resulting in estimated trajectories deviating significantly from the reference path. In contrast, our method effectively addresses the risk of visual relocalization failures and mitigates the impact of abnormal observations on positioning accuracy by employing an optimization-based fusion strategy. Additionally, our method

achieves globally smooth and drift-free trajectory estimations through incremental smoothing optimization.

Figure 14 displays the cumulative distribution of horizontal positioning errors, demonstrating the superior performance of our method compared to other algorithms. Table III provides statistical analysis results of horizontal positioning errors in outdoor nighttime conditions. Our method improves the positioning accuracy of PDR by 77.0% and reduces the maximum error from 4.3309 m to 1.3854 m. Remarkably, even in the dark lighting environment, our method outperforms VINS-Mono in terms of positioning accuracy, which is surprising given the latter’s reliance on visual information. The positioning accuracy of DW-PDR/vision is significantly affected by abnormal visual observations. Conversely, our method consistently achieves accurate and remarkable positioning results, with a loop error of only 0.5152 m, underscoring its robustness against visually challenging environments.

V. CONCLUSION

To achieve self-reliant and robust pedestrian navigation using a smartphone in visually challenging environments, our work proposes, ReLoc-PDR, a robust pedestrian positioning framework that integrates Pedestrian Dead Reckoning (PDR) and visual relocalization based on incremental smoothing optimization.

Considering the visual degradation problem in environments with weak textures and varying illumination, we introduce a visual relocalization pipeline using the learned features from deep neural network instead of traditional handcrafted features. This effectively establishes 2D-3D correspondences with higher inlier rates, enhancing the robustness of the pedestrian localization system. Furthermore, we propose an optimization-based fusion strategy that couples the PDR and visual relocalization poses into a graph model. This fusion strategy is accompanied by the use of the Tukey robust kernel, which helps eliminate the risk of abnormal visual observations. Experimental results demonstrate the effectiveness of our ReLoc-PDR in various challenging environments, including corridors with limited texture, overcast weather conditions, and dark nighttime scenarios. The proposed ReLoc-PDR method achieves accurate and smooth trajectory estimation, continuously providing pedestrian positions at a decimeter-level accuracy and a high frequency.

REFERENCES

[1] Qu Wang, Meixia Fu, Jianquan Wang, Haiyong Luo, Lei Sun, Zhangchao Ma, Wei Li, Chaoyi Zhang, Rong Huang, Xianda Li, Zhuqing Jiang, and Qilian Liang. Recent advances in pedestrian inertial navigation based on smartphone: A review. IEEE Sensors Journal, 22(23):22319–22343, 2022.

[2] Francesca De Cillis, Luca Faramondi, Federica Inderst, Stefano Marsella, Marcello Marzoli, Federica Pascucci, and Roberto Setola. Hybrid indoor positioning system for first responders. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 50(2):468–479, 2020.

[3] Shuli Guo, Yitong Zhang, Xinzhe Gui, and Lina Han. An improved pdr/uwb integrated system for indoor navigation applications. IEEE Sensors Journal, 20(14):8046–8061, 2020.

[4] Yuan Zhuang and Naser El-Sheimy. Tightly-coupled integration of wifi and mems sensors on handheld devices for indoor pedestrian navigation. IEEE Sensors Journal, 16(1):224–234, 2016.

[5] Thai-Mai Thi Dinh, Ngoc-Son Duong, and Kumbesan Sandrasegaran. Smartphone-based indoor positioning using ble ibeacon and reliable lightweight fingerprint map. IEEE Sensors Journal, 20(17):10283– 10294, 2020.

[6] Torsten Sattler, Bastian Leibe, and Leif Kobbelt. Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9):1744– 1756, 2017.

[7] Linus Svarm, Olof Enqvist, Fredrik Kahl, and Magnus Oskarsson. City-scale localization for cameras with known vertical direction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(7):1455– 1461, 2017.

[8] Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From coarse to fine: Robust hierarchical localization at large scale. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12708–12717, 2019.

[9] Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, and Akihiko Torii. Inloc: Indoor visual localization with dense matching and view synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4):1293– 1307, 2021.

[10] Yan Zhou, Xianwei Zheng, Ruizhi Chen, Hanjiang Xiong, and Sheng Guo. Image-based localization aided indoor pedestrian trajectory estimation using smartphones. Sensors, 18(1), 2018.

[11] Jiuchao Qian, Yuhao Cheng, Rendong Ying, and Peilin Liu. A novel indoor localization method based on image retrieval and dead reckoning. Applied Sciences, 10(11), 2020.

[12] Mingcong Shu, Guoliang Chen, Zhenghua Zhang, and Lei Xu. Accurate indoor 3d location based on mems/vision by using a smartphone. In 2022 IEEE 12th International Conference on Indoor Positioning and Indoor Navigation (IPIN), pages 1–8, 2022.

[13] Mingcong Shu, Guoliang Chen, and Zhenghua Zhang. Efficient imagebased indoor localization with mems aid on the mobile device. ISPRS Journal of Photogrammetry and Remote Sensing, 185:85–110, 2022.

[14] Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and description. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 337–33712, 2018.

[15] Michael Kaess, Hordur Johannsson, Richard Roberts, Viorela Ila, John Leonard, and Frank Dellaert. isam2: Incremental smoothing and mapping with fluid relinearization and incremental variable reordering. In 2011 IEEE International Conference on Robotics and Automation, pages 3281–3288, 2011.

[16] Yingbiao Yao, Lei Pan, Wei Fen, Xiaorong Xu, Xuesong Liang, and Xin Xu. A robust step detection and stride length estimation for pedestrian dead reckoning using a smartphone. IEEE Sensors Journal, 20(17):9685–9697, 2020.

[17] Wael Elloumi, Abdelhakim Latoui, Raphael Canals, Aladine Chetouani, ¨ and Sylvie Treuillet. Indoor pedestrian localization with a smartphone: A comparison of inertial and vision-based methods. IEEE Sensors Journal, 16(13):5376–5388, 2016.

[18] Zhouyang Wang, Erwu Liu, and Rui Wang. A vision-aided pdr localization system. In 2020 Information Communication Technologies Conference (ICTC), pages 98–102, 2020. [19] Dorian Galvez-Lopez and J. D. Tardos. Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics, 28(5):1188–1197, October 2012.

[20] Herve Jegou, Matthijs Douze, Cordelia Schmid, and Patrick Perez. Aggregating local descriptors into a compact image representation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3304–3311, 2010.

[21] Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Netvlad: Cnn architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1437–1451, 2018.

[22] Xiwu Zhang, Lei Wang, and Yan Su. Visual place recognition: A survey from deep learning perspective. Pattern Recognition, 113:107760, 2021.

[23] Yuki Ono, Eduard Trulls, Pascal Fua, and Kwang Moo Yi. Lf-net: Learning local features from images. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 6237–6247, Red Hook, NY, USA, 2018. Curran Associates Inc.

[24] Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.

[25] Harvey Weinberg. Using the adxl202 in pedometer and personal navigation applications. Analog Devices AN-602 application note, 2(2):1–6, 2002.

[26] David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60:91–110, 2004.

[27] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. In 2011 International Conference on Computer Vision, pages 2564–2571, 2011.

[28] Laurent Kneip, Davide Scaramuzza, and Roland Siegwart. A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In CVPR 2011, pages 2969–2976, 2011.

[29] Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, jun 1981.

[30] David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60:91–110, 2004.

[31] Zhengyou Zhang. Parameter estimation techniques: A tutorial with application to conic fitting. Image and Vision Computing, pages 59– 76, 1997.

[32] Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018.

[33] Frank Dellaert and GTSAM Contributors. borglab/gtsam, May 2022.

[34] Johannes L. Schonberger and Jan-Michael Frahm. Structure-from- ¨ motion revisited. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4104–4113, 2016.

[35] John Wang and Edwin Olson. Apriltag 2: Efficient and robust fiducial detection. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4193–4198, 2016.

[36] Wei Xu, Yixi Cai, Dongjiao He, Jiarong Lin, and Fu Zhang. Fastlio2: Fast direct lidar-inertial odometry. IEEE Transactions on Robotics, 38(4):2053–2073, 2022.

Authors:

(1) Zongyang Chen, College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China ([email protected]);

(2) Xianfei Pan, College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China ([email protected]);

(3) Changhao Chen, College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China ([email protected]).

This paper is available on arxiv under CC BY 4.0 DEED license.