paint-brush
Visualizing Promptable and Open-Vocabulary Segmentation Across Multiple Datasetsby@segmentation

Visualizing Promptable and Open-Vocabulary Segmentation Across Multiple Datasets

by SegmentationNovember 13th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section presents visualizations of promptable and open-vocabulary segmentation results across several datasets, showcasing the framework's performance in segmenting and predicting masks.
featured image - Visualizing Promptable and Open-Vocabulary Segmentation Across Multiple Datasets
Segmentation HackerNoon profile picture

Authors:

(1) Zhaoqing Wang, The University of Sydney and AI2Robotics;

(2) Xiaobo Xia, The University of Sydney;

(3) Ziye Chen, The University of Melbourne;

(4) Xiao He, AI2Robotics;

(5) Yandong Guo, AI2Robotics;

(6) Mingming Gong, The University of Melbourne and Mohamed bin Zayed University of Artificial Intelligence;

(7) Tongliang Liu, The University of Sydney.

Abstract and 1. Introduction

2. Related works

3. Method and 3.1. Problem definition

3.2. Baseline and 3.3. Uni-OVSeg framework

4. Experiments

4.1. Implementation details

4.2. Main results

4.3. Ablation study

5. Conclusion

6. Broader impacts and References


A. Framework details

B. Promptable segmentation

C. Visualisation

C. Visualisation

We illustrate a wide range of visualisations of promptable segmentation and open-vocabulary segmentation across multiple datasets.


Figure 7. Box-promptable segmentation performance. We compare our method with SAM-ViT/L [34] on a wide range of datasets. Given a ground-truth box as the visual prompt, we select the output masks with max IoU by calculating the IoU with the ground-truth masks. We report 1-pt IoU for all datasets.


Figure 8. Point-promptable segmentation performance. We compare our method with SAM-ViT/L [34] on the SegInW datasets [87]. Given a 20 × 20 point grid as a visual prompt, we select the output masks with max IoU by calculating the IoU with the ground-truth masks. We report 1-pt IoU for all datasets.


Figure 9. Box-promptable segmentation performance. We compare our method with SAM-ViT/L [34] on the SegInW datasets [87]. Given a ground-truth box as the visual prompt, we select the output masks with max IoU by calculating the IoU with the ground-truthmasks. We report 1-pt IoU for all datasets.


Table 5. Segmentation datasets used to evaluate promptable segmentation with point and box prompts. The 11 datasets cover a broad range of domains, which are illustrated in “image type”.


Figure 10. Visualisation of open-vocabulary segmentation between the baseline and ours Uni-OVSeg.


Figure 11. Visualisation of open-vocabulary segmentation between the baseline and ours Uni-OVSeg.


Figure 12. Visualisation of open-vocabulary segmentation between the baseline and ours Uni-OVSeg.


Figure 13. Visualisation of promptable segmentation between SAM-ViT/L and ours Uni-OVSeg.


Figure 14. Visualisation of promptable segmentation between SAM-ViT/L and ours Uni-OVSeg.


Figure 15. Visualisation of promptable segmentation between SAM-ViT/L and ours Uni-OVSeg.


Figure 16. Visualisation of promptable segmentation between SAM-ViT/L and ours Uni-OVSeg.


Figure 17. Visualisation of promptable segmentation between SAM-ViT/L and ours Uni-OVSeg.


This paper is available on arxiv under CC BY 4.0 DEED license.