paint-brush
10 Biggest Image Datasets for Computer Visionby@valentineenedah
15,365 reads
15,365 reads

10 Biggest Image Datasets for Computer Vision

by Valentine EnedahJanuary 5th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Computer vision is quite important in the field of artificial intelligence. It enables computers to replicate the human visual system and it uses information from images & videos to identify and classify objects. Data is very important in building computer vision models and in this article, we will look at the 10 Biggest Datasets for Computer Vision.
featured image - 10 Biggest Image Datasets for Computer Vision
Valentine Enedah HackerNoon profile picture

Computer vision is quite important in the field of artificial intelligence.


It enables computers to replicate the human visual system and it uses information from images & videos to identify and classify objects.


Although we have a good amount of programming languages for computer vision, the most used ones include C++ and Python.


Data is very important in building computer vision models and in this article, we will look at the 10 Biggest Datasets for Computer Vision.

Image Datasets for Computer Vision

  1. CIFAR-10 and CIFAR-100 - The CIFAR-10 consists of 60,000 32x32 colour images in 10 classes, with 6,000 images per class. It has a total of 50000 training images and 10,000 test images which is further divided into five training batches and one test batch, each with 10,000 images. The CIFAR-100 has 60,000 32x32 colour images in 100 classes, with 600 images per class. The 100 classes are grouped into 20 super-classes with a fine label to denote its class and a coarse label to represent the superclass that it belongs to.


  2. ImageNet- This is a dataset of images organized according to the WordNet hierarchy. It has 1,000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test images.
    To download this dataset, you have to visit the website and please log in to get access.


  3. MS Coco - The Microsoft Common Objects in Context(MS Coco) dataset contains  328,000 high-quality visual images of everyday objects and humans, often used as a standard to compare the performance of algorithms in real-time object detection.


  4. Flickr 30k - The Flickr dataset consists of 31,000 images collected from Flickr along with 5 reference sentences provided by the human annotators. For sentence-based image descriptions, it has become a standard benchmark.


    The examples in the Flickr30k Entities (the 1st row) and ReferItGame (the 2nd row) datasets. 


  5. IMDB- Wiki - This dataset is the largest dataset available publicly. It contains more than 500,000+ images of human faces with gender, age, and name.


  6. Berkeley Deep Drive- The BDD110K is the largest varied driving video collection, with 100,000 videos annotated for ten different autonomous driving perception tasks.
    To download the dataset, please visit the website and login to download.


  7. LSUN- This LSUN classification dataset has 10 scene categories and 20 object categories. Each category in the training data has a sizable amount of photos, ranging from about 120,000 to 3,000,000.


    Here are some extra details about the LSUN dataset:


    I. Scene categories(bedroom, bridge, classroom, conference_room, living_room, restaurant, tower, dining_room, kitchen and  church_outdoor).


    II.Object categories (airplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining_table, dog, horse, motorbike, person, potted_plant, sheep, sofa, train, tv-monitor).

    To access the dataset, visit GitHub.


  8. Kinetics 700 - The Kinetics 700 is a video dataset of 650,000 media clips that consists of 700 classes of human action classes. Some of these interactions include; shaking hands and hugging. It has at least 700 media clips per action class and hence the name Kinetics 700.

    Every clip is roughly ten seconds long and has been manually labelled with one action class.

    To download the data set, click the download dataset option.


  9. MPII Human Pose-  This human pose dataset is a modern standard for assessing articulated human pose estimation. The dataset consists of 25K images extracted from a YouTube video containing over 40K people with annotated body joints, which cover 410 human activities, and each image is provided with an activity label.


    To download this dataset, click here.


    MPII Human Pose Models


  10. LabelMe-12-50k- First of all, this dataset is a challenging task for object recognition systems since the examples of each object class exhibit wide variations in appearance, lighting, and viewing angles.


    The dataset comprises of 50,000 JPEG images that are each 256x256 pixels in size (40,000 for training and 100,000 for testing)


    To download this dataset, click here.

Final Thoughts

Computer vision can help with tasks like facial recognition and image analysis. The datasets are available for anyone to download and use freely.


The lead image of this article was generated via HackerNoon's AI Stable Diffusion model using the prompt 'thousands of images superimposed'.