Video Instance Matting: V-HIM60 Dataset and Difficulty Levels

Written by instancing | Published 2025/12/19
Tech Story Tags: deep-learning | robustness-testing | binary-mask-guidance | instance-matting-synthesis | ihim50k-dataset | video-instance-matting | ihim50k | v-him2k5-training-set

TLDRMaGGIe introduces the IHIM50K image dataset and V-HIM60 video benchmark. These sets test model robustness against varying mask quality and instance overlap.via the TL;DR App

Abstract and 1. Introduction

  1. Related Works

  2. MaGGIe

    3.1. Efficient Masked Guided Instance Matting

    3.2. Feature-Matte Temporal Consistency

  3. Instance Matting Datasets

    4.1. Image Instance Matting and 4.2. Video Instance Matting

  4. Experiments

    5.1. Pre-training on image data

    5.2. Training on video data

  5. Discussion and References

Supplementary Material

  1. Architecture details

  2. Image matting

    8.1. Dataset generation and preparation

    8.2. Training details

    8.3. Quantitative details

    8.4. More qualitative results on natural images

  3. Video matting

    9.1. Dataset generation

    9.2. Training details

    9.3. Quantitative details

    9.4. More qualitative results

4.1. Image Instance Matting

We derived the Image Human Instance Matting 50K (IHIM50K) training dataset from HHM50K [50], featuring multiple human subjects. This dataset includes 49,737 synthesized images with 2-5 instances each, created by compositing human foregrounds with random backgrounds and modifying alpha mattes for guidance binary masks. For benchmarking, we used HIM2K [49] and created the Mask HIM2K (M-HIM2K) set to test robustness against varying mask qualities from available instance segmentation models (as shown in Fig. 3). Details on the generation process are available in the supplementary material.

4.2. Video Instance Matting

Our video instance matte dataset, synthesized from VM108 [57], VideoMatte240K [33], and CRGNN [52], includes subsets V-HIM2K5 for training and V-HIM60 for testing. We categorized the dataset into three difficulty levels based on instance overlap. Table 2 shows some details of the synthesized datasets. Masks in training involved dilation and erosion on binarized alpha mattes. For testing, masks are generated using XMem [8]. Further details on dataset synthesis and difficulty levels are provided in the supplementary material.

Authors:

(1) Chuong Huynh, University of Maryland, College Park ([email protected]);

(2) Seoung Wug Oh, Adobe Research (seoh,[email protected]);

(3) Abhinav Shrivastava, University of Maryland, College Park ([email protected]);

(4) Joon-Young Lee, Adobe Research ([email protected]).


This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.


Written by instancing | Pioneering instance management, driving innovative solutions for efficient resource utilization, and enabling a more sus
Published by HackerNoon on 2025/12/19