How 24 Special Queries Optimized a Neural Network’s Recall Rate

Written by instancing | Published 2025/07/16
Tech Story Tags: cross-modal-ai | human-robot-interaction | 3d-point-cloud-navigation | spatial-language-grounding | instance-free-localization | ai-in-robotics | vision-language-models

TLDRThis article explores query optimization and embedding space analysis in the IFRP-T2P model using the KITTI360Pose dataset. Testing with 16, 24, and 32 queries reveals that 24 offers the best localization recall. Additionally, IFRP-T2P outperforms Text2Loc by producing a more discriminative and informative text-cell embedding space, improving retrieval accuracy.via the TL;DR App

Table of Links

Abstract and 1. Introduction

  1. Related Work

  2. Method

    3.1 Overview of Our Method

    3.2 Coarse Text-cell Retrieval

    3.3 Fine Position Estimation

    3.4 Training Objectives

  3. Experiments

    4.1 Dataset Description and 4.2 Implementation Details

    4.3 Evaluation Criteria and 4.4 Results

  4. Performance Analysis

    5.1 Ablation Study

    5.2 Qualitative Analysis

    5.3 Text Embedding Analysis

  5. Conclusion and References

Supplementary Material

  1. Details of KITTI360Pose Dataset
  2. More Experiments on the Instance Query Extractor
  3. Text-Cell Embedding Space Analysis
  4. More Visualization Results
  5. Point Cloud Robustness Analysis

Anonymous Authors

  1. Details of KITTI360Pose Dataset
  2. More Experiments on the Instance Query Extractor
  3. Text-Cell Embedding Space Analysis
  4. More Visualization Results
  5. Point Cloud Robustness Analysis

1 DETAILS OF KITTI360POSE DATASET

2 MORE EXPERIMENTS ON THE INSTANCE QUERY EXTRACTOR

We conduct an additional experiment to assess the impact of the number of queries on the performance of our instance query extractor. As detailed in Table 1, we evaluate the localization recall rate using 16, 24, and 32 queries. The result demonstrates that using 24 queries yields the highest localization recall rate, i.e, 0.23/0.53/0.64 on the validation set and 0.22/0.47/0.58 on the test set. This finding suggests that the optimal number of queries for maximizing the effectiveness of our model is 24.

3 TEXT-CELL EMBEDDING SPACE ANALYSIS

Fig. 2 shows the aligned text-cell embedding space via T-SNE [? ]. Under the instance-free scenario, we compare our model with Text2loc [? ] using a pre-trained instance segmentation model, Mask3D [? ], as a prior step. It can be observed that Text2Loc results in a less discriminative space, where positive cells are relatively far from the text query feature. In contrast, our IFRP-T2P effectively reduces the distance between positive cell features and text query features within the embedding space, thereby creating a more informative embedding space. This enhancement in the embedding space is critical for improving the accuracy of text-cell retrieval.

Authors:

(1) Lichao Wang, FNii, CUHKSZ ([email protected]);

(2) Zhihao Yuan, FNii and SSE, CUHKSZ ([email protected]);

(3) Jinke Ren, FNii and SSE, CUHKSZ ([email protected]);

(4) Shuguang Cui, SSE and FNii, CUHKSZ ([email protected]);

(5) Zhen Li, a Corresponding Author from SSE and FNii, CUHKSZ ([email protected]).


This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.


Written by instancing | Pioneering instance management, driving innovative solutions for efficient resource utilization, and enabling a more sus
Published by HackerNoon on 2025/07/16