This story draft by @escholar has not been reviewed by an editor, YET.

Qualitative Analysis and Text Embedding Analysis

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Related Work

  2. Method

    3.1 Overview of Our Method

    3.2 Coarse Text-cell Retrieval

    3.3 Fine Position Estimation

    3.4 Training Objectives

  3. Experiments

    4.1 Dataset Description and 4.2 Implementation Details

    4.3 Evaluation Criteria and 4.4 Results

  4. Performance Analysis

    5.1 Ablation Study

    5.2 Qualitative Analysis and 5.3 Text Embedding Analysis

  5. Conclusion and References


Supplementary Material

  1. Details of KITTI360Pose Dataset
  2. More Experiments on the Instance Query Extractor
  3. Text-Cell Embedding Space Analysis
  4. More Visualization Results
  5. Point Cloud Robustness Analysis


Anonymous Authors

  1. Details of KITTI360Pose Dataset
  2. More Experiments on the Instance Query Extractor
  3. Text-Cell Embedding Space Analysis
  4. More Visualization Results
  5. Point Cloud Robustness Analysis

5.2 Qualitative Analysis

In addition to the quantitative metrics, we also offer a qualitative analysis comparing the top-1/2/3 retrieved cells by Text2Loc [42] and IFRP-T2P, as depicted in Fig. 6. In the first column, the result indicates that both models can retrieve cells with the described instances. However, there are notable differences in their accuracy with respect to the spatial relation descriptions provided. Specifically, for the “beige parking” instance, which is described as being located to the west of the cell, the retrieval result of Text2Loc inaccurately places it to the e ast of the cell centers. Conversely, IFRP-T2P correctly locates this instance to the east of the center, aligning with the given description. In the second column, the text hints describe that the pose is on-top of a “dark-green vegetation” and is north of a “dark-green parking”. For Text2Loc, the parking is found to the north of the cell center in the top-1/2 retrieved cells, and the vegetation is located at the margin area of the top-1/2/3 retrieved cells, discrepant from the text description. For IFRP-T2P, however, the parking appears on the south of the cell center in the top-1/2 retrieved cells, and the vegetation appears on the center of the top-1/2/3 retrieved cells, which matches with the text


Figure 6: Comparison of the top-3 retrieved cells between Text2Loc [42] and IFRP-T2P. The numbers within the top-3 retrieval submaps denote the center distances between the retrieved submaps and the ground-truth, with “n/a” indicating distances exceeding 1000 meters. Green boxes highlight the positive submaps, which contain the target location, whereas red boxes delineate the negative submaps that do not contain the target.


description. Notably, in both cases, only the third retrieved cell by IFRP-T2P exceeds the error threshold. This evidence solidifies the superior capacity of IFRP-T2P to interpret and utilize relative position information in comparison to Text2Loc. More case studies of our IFRP-T2P are provided in the supplement material.

5.3 Text Embedding Analysis

Recent years have seen the emergence of large language models (encoders) like BERT [14], RoBERTa [24], T5 [33], and the CLIP [31] text encoder, each is trained with varied tasks and datasets. Text2Loc highlights that a pre-trained T5 model significantly enhances text and point cloud feature alignment. Yet, the potential of other models, such as RoBERTa and the CLIP text encoder, known for their excellence in visual grounding tasks, is not explored in their study. Thus, we conduct a comparative analysis of T5-small, RoBERTa-base, and the CLIP text encoder within our model framework. The result in Table 6 indicates that the T5-small (61M) achieves 0.24/0.46/0.57 at the top-1/3/5 recall metrics, incrementally outperforming RoBERTabase (125M) and CLIP text (123M) with fewer parameters.


Authors:

(1) Lichao Wang, FNii, CUHKSZ ([email protected]);

(2) Zhihao Yuan, FNii and SSE, CUHKSZ ([email protected]);

(3) Jinke Ren, FNii and SSE, CUHKSZ ([email protected]);

(4) Shuguang Cui, SSE and FNii, CUHKSZ ([email protected]);

(5) Zhen Li, a Corresponding Author from SSE and FNii, CUHKSZ ([email protected]).


This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks