Computer vision is quite important in the field of artificial intelligence. It enables computers to replicate the human visual system and it uses information from images & videos to identify and classify objects. Data is very important in building computer vision models and in this article, we will look at the 10 Biggest Datasets for Computer Vision.
Computer vision is quite important in the field of artificial intelligence.
It enables computers to replicate the human visual system and it uses information from images & videos to identify and classify objects.
Although we have a good amount of programming languages for computer vision, the most used ones include C++ and Python.
Data is very important in building computer vision models and in this article, we will look at the 10 Biggest Datasets for Computer Vision.
Image Datasets for Computer Vision
CIFAR-10 and CIFAR-100 - The CIFAR-10 consists of 60,000 32x32 colour images in 10 classes, with 6,000 images per class. It has a total of 50000 training images and 10,000 test images which is further divided into five training batches and one test batch, each with 10,000 images. The CIFAR-100 has 60,000 32x32 colour images in 100 classes, with 600 images per class. The 100 classes are grouped into 20 super-classes with a fine label to denote its class and a coarse label to represent the superclass that it belongs to.
ImageNet- This is a dataset of images organized according to the WordNet hierarchy. It has 1,000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test images. To download this dataset, you have to visit thewebsite and please log in to getaccess.
MS Coco - The Microsoft Common Objects in Context(MS Coco) dataset contains 328,000 high-quality visual images of everyday objects and humans, often used as a standard to compare the performance of algorithms in real-time object detection.
Flickr 30k - The Flickr dataset consists of 31,000 images collected from Flickr along with 5 reference sentences provided by the human annotators. For sentence-based image descriptions, it has become a standard benchmark.
IMDB- Wiki - This dataset is the largest dataset available publicly. It contains more than 500,000+ images of human faces with gender, age, and name.
Berkeley Deep Drive- The BDD110K is the largest varied driving video collection, with 100,000 videos annotated for ten different autonomous driving perception tasks. To download the dataset, please visit thewebsite and login to download.
LSUN- This LSUN classification dataset has 10 scene categories and 20 object categories. Each category in the training data has a sizable amount of photos, ranging from about 120,000 to 3,000,000.
Here are some extra details about the LSUN dataset:
I. Scene categories(bedroom, bridge, classroom, conference_room, living_room, restaurant, tower, dining_room, kitchen and church_outdoor).
Kinetics 700 - The Kinetics 700 is a video dataset of 650,000 media clips that consists of 700 classes of human action classes. Some of these interactions include; shaking hands and hugging. It has at least 700 media clips per action class and hence the name Kinetics 700.
Every clip is roughly ten seconds long and has been manually labelled with one action class.
MPII Human Pose- This human pose dataset is a modern standard for assessing articulated human pose estimation. The dataset consists of 25K images extracted from a YouTube video containing over 40K people with annotated body joints, which cover 410 human activities, and each image is provided with an activity label.
LabelMe-12-50k- First of all, this dataset is a challenging task for object recognition systems since the examples of each object class exhibit wide variations in appearance, lighting, and viewing angles.
The dataset comprises of 50,000 JPEG images that are each 256x256 pixels in size (40,000 for training and 100,000 for testing)