Your Smartphone Could Soon Track You More Accurately Than GPS

Table of Links

III. VISUAL RELOCALIZATION ENHANCED PEDESTRIAN DEAD RECKONING

A. System Design

1) System Overview: The framework of our ReLoc-PDR is depicted in Figure 1, comprising three primary modules: inertial sensor-based Pedestrian Dead Reckoning (PDR), visual relocalization, and graph optimization-based pose fusion. In this architecture, the PDR algorithm is employed to compute the per-step pose using the built-in inertial sensors of a commercially available smartphone. The visual relocalization pipeline aims to accurately and robustly estimate the pose of the triggered image captured by the smartphone camera. This estimation is performed in relation to a pre-built 3D feature map, enabling global visual observations that periodically correct the accumulated error in the PDR. Finally, the pose fusion module integrates the pose results from PDR and visual relocalization using graph optimization with a robust Tukey kernel. This integration enables the continuous and smooth estimation of the pedestrian’s position and trajectory during long-term walking.

2) Pedestrian Dead Reckoning: The PDR algorithm utilizes inertial data to estimate the pedestrian’s position based on human motion characteristics. It consists of four main steps: step detection, step length estimation, heading angle estimation, and position update. Step detection relies on the repetitive pattern observed in the accelerometer measurements during human walking. In our work, we employ a multi-threshold peak detection algorithm to identify pedestrian gait:

Here, am represents the magnitude of acceleration, δmin and δmax denote the minimum and maximum acceleration values within one step, g represents the gravity value, ∆T denotes the time interval between adjacent peaks, and δt represents the minimum duration threshold. Considering the complexity of pedestrian movement and the presence of inevitable noises in low-cost MEMS inertial sensors, we employ a fourth-order low-pass filter to preprocess the acceleration signal. This preprocessing step enhances the quality of the gait characteristics obtained. Furthermore, to eliminate false peaks caused by external interference, we evaluate whether each peak is a local maximum value within a specific sliding window size. This additional criterion helps ensure the accuracy of the detected gait peaks.

The step length estimation component aims to calculate the distance covered by a pedestrian in a single step, which is influenced by the pedestrian’s motion states. In our approach, we employ the Weinberg model [25] to estimate the pedestrian step length via:

Here, K represents the calibrated step-length coefficient, while az,max and az,min denote the maximum and minimum values of vertical acceleration during step k, respectively.

Heading estimation is utilized to determine the walking direction of the pedestrian. We leverage the gyroscope data from the smartphone’s built-in Inertial Measurement Unit (IMU) to estimate the pedestrian’s heading angle. This is achieved through an attitude update equation based on the median integration method. Finally, the pedestrian’s position is updated based on their previous position, incorporating the estimated step length and heading angle. The updated position is as follows:

Here, ψk represents the heading angle, while xk and yk indicate the pedestrian’s horizontal position at step k.

B. Visual Relocalization with Learned Feature

Given that pedestrian navigation spans diverse indoor and outdoor environments, the visual relocalization method needs to exhibit robustness in handling various viewpoint and lighting conditions, including illumination, weather, seasonal changes, and consistent performance in both indoor and outdoor settings. The robustness of traditional retrievalbased [19], [20] or structure-based relocalization [6], [7], [10], [13] methods is limited due to the insufficient invariance of handcrafted designed features [26], [27], resulting in reduced stability performance under conditions with low texture or poor lighting. Recent advancements in deep neural network-based methods, such as NetVLAD [21] and SuperPoint [14], have demonstrated superior capabilities in image feature extraction, keypoint detection, and matching. These deep learning-based approaches surpass traditional baselines like bag-of-words [19], VLAD [20], and SIFT [26] in terms of robustness. Motivated by these developments, we incorporate the advancements in learned global descriptors and learned local features into the visual relocalization pipeline. This integration enhances the robustness of the pedestrian positioning system in visually degraded scenarios.

C. Integrating PDR and Visual Relocalization via Factor Graph Optimization

In the process of visual and inertial pose fusion, ensuring the reliability of observation information is crucial for maintaining the stability of the multi-sensors positioning system. However, we observed that evaluating the quality of visual relocalization solely based on the number of inliers [12], [13] is not robust enough in visually degraded scenes. Examples of such scenes include texture-less walls, areas with similar structures, and dark roadways. In these scenarios, abnormal visual observations can still be introduced into the positioning system, leading to a significant degradation in accuracy.

To address this challenge, we propose a robust pose fusion algorithm that integrates Pedestrian Dead Reckoning (PDR) with visual relocalization using graph optimization [15] and the Tukey robust kernel [31]. By incorporating the Tukey kernel function into the pose graph, we can adaptively assess the impact of current visual relocalization results on the system’s states. This adaptive assessment dynamically determines the weight of the visual observation, effectively mitigating the risk of visual relocalization failures. Furthermore, unlike existing visual-inertial fusion methods [32], we employ an inertial-centric data processing scheme. This scheme enables the dynamic integration of visual relocalization observations into the graph. In the pose graph, as illustrated in Figure 3, each step node serves as a vertex, connecting to other vertices through two types of edges.

1) PDR Factor: As discussed in Section III-A, the Pedestrian Dead Reckoning (PDR) algorithm offers the advantages of autonomy and continuity, enabling high-accuracy positioning within a short period. Upon successful detection of a step during pedestrian walking, a PDR factor is established to connect it with the previous step. This PDR factor represents the relative change in the pedestrian’s position and is obtained directly from the PDR algorithm. Given our inertial-sensorcentric pose graph, it continuously expands as pedestrian steps are taken, rendering it relatively robust to environmental variations. For step k and its previous step k − 1, the residual of the PDR factor is formulated as follows:

2) Relocalization Factor: Before incorporating relocalization edges into the pose graph, a reliability assessment is performed to mitigate the impact of visual relocalization failures on positioning accuracy. In our approach, we utilize the number of inliers as the criterion to determine the success of visual relocalization results. If the number of inliers exceeds 25, we consider the relocalization results reliable. In such cases, we add a ReLoc edge to the current state node and subsequently perform incremental smoothing optimization. However, if the number of inliers is below the threshold, we skip the pose graph optimization step and rely solely on the previously optimized state and PDR estimation to determine the current pedestrian position.

Assuming that the visual relocalization result is reliable at step k, the residual of the relocalization factor is calculated using equation (5):

To achieve real-time pose optimization, we utilize an adaptive-lag smoother called Incremental Smoothing and Mapping (iSAM2) [15]. Unlike batch optimizers that repeatedly compute and update all historical states, iSAM2 dynamically determines which historical states are affected by the current observations and selectively optimizes and updates only those affected states. This adaptive approach significantly reduces unnecessary computations, resulting in near-optimal results comparable to batch graph optimization but at a lower computational cost. For implementation, we employ the opensource GTSAM library [33] to construct the factor graph and perform incremental smoothing optimization. The use of GTSAM enables efficient construction and manipulation of the factor graph, facilitating the real-time pose optimization process.

Authors:

(1) Zongyang Chen, College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China ([email protected]);

(2) Xianfei Pan, College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China ([email protected]);

(3) Changhao Chen, College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China ([email protected]).

This paper is available on arxiv under CC BY 4.0 DEED license.