Have you ever being in a situation to guess another person’s age? Well May be YES!! How about playing games like finding things in minimum time? or about finding the written character where your doctor wrote in the prescription when you are sick? Well everyone faced these problems in real life. How about asking your machine or your favorite computer to do the task for you. Isn’t it great? well computers actually do by using Machine Learning. so for doing this we actually need to train the machine by using some powerful datasets. The key to getting better most fields in life is practice. Practice on a variety of problem from image processing to speech recognition. Each of these problem has it’s own unique technique and approach. But how do you get this data? Working on these datasets will make you a better data expert and the amount of learning you will have will be invaluable in your career. We have listed a collection of high quality datasets that every Machine learning enthusiast should work on to apply and improve their skill. IMAGE DATASETS Car License Plate Detection Has around 500 images with car license plates marked as rectangular bounding boxes in images of cars on roads and streets. to dataset. Link Celebrity Face Key-Points A database of around 2500 images with faces of celebrities and important key-points like eyes, nose etc marked. to the Dataset. Link E-commerce Tagging for clothing Images from E-commerce sites with bounding boxes drawn around shirts, jackets, sunglasses etc. Has around 500 images manually tagged for item detection. to Dataset. Link Wound Dataset Around 300 medical surgery images with bounding boxes drawn around wounds. to Dataset. Link IMDB-WIKI dataset , an abbreviation of , is an of information related to world films, television programs, home videos and video games, and internet streams, including cast, production crew and personnel biographies, plot summaries, trivia, and fan reviews and ratings. An additional fan feature, message boards, was abandoned in February 2017. Originally a fan-operated website, the database is owned and operated by IMDb.com, Inc., a subsidiary of . Not very rare but the grand-daddy of all image datasets. IMDb Internet Movie Database online database Amazon Description: IMDB and Wikipedia face images with gender and age labels. Instances:523,051 Format: images Default task: Gender classification, face detection, face recognition, age estimation Created: 2015 by R. Rothe, R. Timofte, L. V. Gool Download link: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/ The ROOMS This dataset is an image classification dataset to classify room images as bedroom, kitchen, bathroom, living room, exterior, etc. Images from different houses are collected and kept together as a dataset for computer testing and training. This dataset helps for finding which image belongs to which part of house. Description: The dataset has 20001 items of which 4404 items have been manually labeled. Categories: bedroom, kitchen, bathroom, exterior, living room, other Default task: image classification, image captioning. Format: images Created by: DataTurks Download link : https://dataturks.com/projects/sheerun/rooms Visual Genome dataset Visual Genome is a dataset, a knowledge base, an ongoing effort to connect structured image concepts to language. Description: 108,077 Images 5.4 Million Region Descriptions 1.7 Million Visual Question Answers 3.8 Million Object Instances 2.8 Million Attributes 2.3 Million Relationships Everything Mapped to Wordnet Synsets Format: images, text Default task: Image captioning Created : 2016 by R. Krishna et al. Download link : http://visualgenome.org/api/v0/api_home.html CRACK Classification dataset This dataset is to classify the cracks on the walls. The dataset consists of wall images with or without cracks It has images with shadow of some wires also which exactly looks like cracks on the wall, we should train the system carefully so that it has to differentiate between cracks and shadow. This dataset is very challenging which will revamp your coding skills. Description: The dataset has 1428 items of which 1428 items have been manually labeled. Categories: crack, no-crack Format: images Default task: image classification Created by : Data Turks Download link: https://dataturks.com/projects/miaozh17/Crack%20Classification IIT-5K OCR dataset Has 5K labeled images of street signs cropped to just contain the portion that has the text. Quite a difficult dataset with even the best vision algorithms being at 80% accuracy rates. (Read: comparison of on this dataset) Google, AWS, Microsoft OCR APIs to the dataset. Link CARS dataset This dataset is to identify cars in the images. The set has different images which does or does not have cars in it. The main objective of this dataset is to identify even the small parts of the car in the images. This dataset is human labeled dataset. Description: dataset has 613 items of which 604 items have been manually labeled. Categories: cars, no-cars. Format : images Default task : image classification. Created by: Data Turks. Download link: https://dataturks.com/projects/dominique.paul.info/cars2 The FERET Dataset The Face Recognition Technology (FERET) program is managed by the Defense Advanced Research Projects Agency (DARPA) and the National Institute of Standards and Technology (NIST). Department of Defense (DoD) Counterdrug Technology Development Program Office sponsored the Face Recognition Technology (FERET) program. The goal of the FERET program was to develop automatic face recognition capabilities that could be employed to assist security, intelligence, and law enforcement personnel in the performance of their duties. The FERET database was collected in 15 sessions between August 1993 and July 1996. The database contains 1564 sets of images for a total of 14,126 images that includes 1199 individuals and 365 duplicate sets of images. A duplicate set is a second set of images of a person already in the database and was usually taken on a different day. Description:11338 images of 1199 individuals in different positions and at different times. Format: images Default task: Classification, face recognition Created : 2003 by United States Department of Defense Download link : https:// /.../ www.nist.gov face-recognition-technology-feret Face Detection Has around 1300 faces marked as rectangular bounding boxes in images. Images range from part pics to random people on streets. to Dataset. Link CALTECH-101 dataset is a of . The Caltech 101 data set was used to train and test several machine learning, computer vision recognition and classification algorithms. A set of annotations is provided for each image. Each set of annotations contains two pieces of information: the general bounding box in which the object is located and a detailed human-specified outline enclosing the object. Caltech 101 data set digital images A MATLAB script is provided with the annotations. It loads an image and its corresponding annotation file and displays them as a MATLAB figure. The Caltech 101 data set aims at alleviating many of these common problems. The images are cropped and re-sized. Many categories are represented, which suits both single and multiple class recognition algorithms. Detailed object outlines are marked. Available for general use, Caltech 101 acts as a common standard by which to compare different algorithms without bias due to different data sets. Description: Pictures of objects, detailed object outlines marked. Instances: 9,146 images, split between 101 different object categories, as well as an additional background/clutter category. Format: Images Default task: Classification, object recognition. Created: September 2003 and compiled by Fei-Fei Li Download link : http://www.vision.caltech.edu/Image_Datasets/Caltech101/ UXBOT Dataset This dataset is to classify uxbot pictures into dark, professional, Minimalist, Glamorous, etc.… uxbot is the platform for chatting now a days. This dataset is used to train computer with new technical skills. It is human labeled dataset. Description : dataset has 129 items of which 129 items have been manually labeled. Format : images Categories : Elegant, clean , fresh , light, Airy, cooperate, funky, Retro, Eddy, fun, etc.…. Default task : image classification Created : Data Turks. Download link: https://dataturks.com/projects/briannaorg/UXBot LABELME Dataset LabelMe is a project created by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) which provides a dataset of digital images with annotations. The dataset is dynamic, free to use, and open to public contribution. The motivation behind creating LabelMe comes from the history of publicly available data for computer vision researchers. Most available data was tailored to a specific research group’s problems and caused new researchers to have to collect additional data to solve their own problems. LabelMe was created to solve several common shortcomings of available data Description : Large dataset of images for object classification. Format : Images, Text. Default task : Image Classification, object detection. Created : by 2005 MIT Computer Science and Artificial Intelligence Laboratory Download link : http://labelme.csail.mit.edu/Release3.0/browserTools/php/dataset.php You can find thousands of such open datasets here.