An essential part of my company's Machine Learning team is working with different food datasets, and we spend a lot of time before for searching, combining or intersecting different datasets to get data that we need and can use in our work. Given that it might help someone else, I decided to list all helpful datasets in one place.
Datasets
Kaggle:
- Burritos in San Diego
- CHI Restaurant Inspections
- EPIRecipes
- Food and Drink archive
- Food choices
- Food Ingredient List
- Global Food Prices
- Health Nutrition and Population Statistics
- Instacart Market Basket Analysis
- Pizza restaurants and the pizza they sell
- Restaurant data with consumer ratings
- US Healthcare Data
- Vegetarian Vegan Restaurants
- What’s on the menu
- World Food Facts
- YouTube videos, can be filtered by different keywords
Other sources:
- AUSNUT 2011–13 food nutrient database
- Cuisine Classifying
- FAOSTAT Database (Food and Agriculture Organization Statistics)
- Farm-Oriented Open Data
- Food and Agriculture Organization of the United Nations
- Food Composition
- Food composition database for nutrient intake
- Food Repo & Food Opendata
- FOODSECURE — Food and nutrition security in long term perspective
- HumData Food Prices
- Open Grocery Database Project
- New York Times Food data from their cooking website
- Open Recipes Database Dump
Need different datasets?
Recently, Google published a separate project that can help to search for different types of datasets. You can find all of these datasets at toolbox.google.com
Want to learn more?
Additionally, if you want to learn more about data, machine learning, deep learning — you should check this repository: https://github.com/ageron/handson-ml
If you believe we forgot a helpful dataset, please add a comment below with a link to the dataset.
Same collection located at this repository, feel free to take a look or contribute: https://github.com/ChickenKyiv/awesome-food-collection-machine-learning