Defining Open-Vocabulary Segmentation: Problem Setup, Baseline, and the Uni-OVSeg Framework

Authors: (1) Zhaoqing Wang, The University of Sydney and AI2Robotics; (2) Xiaobo Xia, The University of Sydney; (3) Ziye Chen, The University of Melbourne; (4) Xiao He, AI2Robotics; (5) Yandong Guo, AI2Robotics; (6) Mingming Gong, The University of Melbourne and Mohamed bin Zayed University of Artificial Intelligence; (7) Tongliang Liu, The University of Sydney. Table of Links Abstract and 1. Introduction 2. Related works 3. Method and 3.1. Problem definition 3.2. Baseline and 3.3. Uni-OVSeg framework 4. Experiments 4.1. Implementation details 4.2. Main results 4.3. Ablation study 5. Conclusion 6. Broader impacts and References A. Framework details B. Promptable segmentation C. Visualisation 3. Method In this section, we first define the problem of openvocabulary segmentation in Sec. 3.1. We then introduce a straightforward baseline in Sec. 3.2. Finally, we present our proposed Uni-OVSeg framework in Sec. 3.3, including an overview, mask generation, mask-text alignment, and open-vocabulary inference. 3.1. Problem definition This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Zhaoqing Wang, The University of Sydney and AI2Robotics; (2) Xiaobo Xia, The University of Sydney; (3) Ziye Chen, The University of Melbourne; (4) Xiao He, AI2Robotics; (5) Yandong Guo, AI2Robotics; (6) Mingming Gong, The University of Melbourne and Mohamed bin Zayed University of Artificial Intelligence; (7) Tongliang Liu, The University of Sydney. Authors: Authors: (1) Zhaoqing Wang, The University of Sydney and AI2Robotics; (2) Xiaobo Xia, The University of Sydney; (3) Ziye Chen, The University of Melbourne; (4) Xiao He, AI2Robotics; (5) Yandong Guo, AI2Robotics; (6) Mingming Gong, The University of Melbourne and Mohamed bin Zayed University of Artificial Intelligence; (7) Tongliang Liu, The University of Sydney. Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Related works 2. Related works 3. Method and 3.1. Problem definition 3. Method and 3.1. Problem definition 3.2. Baseline and 3.3. Uni-OVSeg framework 3.2. Baseline and 3.3. Uni-OVSeg framework 4. Experiments 4.1. Implementation details 4.1. Implementation details 4.2. Main results 4.2. Main results 4.3. Ablation study 4.3. Ablation study 5. Conclusion 5. Conclusion 6. Broader impacts and References 6. Broader impacts and References A. Framework details A. Framework details B. Promptable segmentation B. Promptable segmentation C. Visualisation C. Visualisation 3. Method In this section, we first define the problem of openvocabulary segmentation in Sec. 3.1. We then introduce a straightforward baseline in Sec. 3.2. Finally, we present our proposed Uni-OVSeg framework in Sec. 3.3, including an overview, mask generation, mask-text alignment, and open-vocabulary inference. 3.1. Problem definition This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv