There are a plethora of use cases that require detailed population data. For example, having a detailed breakdown of the demographic structure is a significant factor in predicting real estate prices. Also, humanitarian projects such as vaccination campaigns or rural electrification plans highly depend on good population data. It is very challenging to find high-quality and up-to-date data on a global scale for these use cases. Usually, census data is published every four years, which makes those datasets outdated quickly. Arguably the best datasets out there for population densities and demographics are published by Facebook under their . They combine official census data with their internal data and leverage machine learning algorithms for image recognition to determine buildings' location and type. Data for Good initiative Using those different sources can give a detailed statistical breakdown of demographic groups in 1-arcsecond blocks, a resolution of approximately 30 meters. Each square contains statistical values for the following demographic groups: Total Female Male Children under 5 Youth 15 - 24 Elderly 60 plus Women of reproductive Age 15 - 49 Facebook delivers for each country a file per demographic group, either as a . The CSV contains the latitude and longitude of the cell and the respective population value. GeoTIFF or CSV Just working with a static CSV file can be cumbersome. That is why we created an that exposes the data over an API. You can directly download the data for entire countries over a CLI. We preprocess the data to make it easily queryable. For that, we are leveraging the power of Uber's H3 spatial indexing. Thanks to the , it is easy to build queries on top of the database. Using either H3 cells or coordinate pairs, you can retrieve the population based on a point, a given radius, or polygon. That way, it is straightforward to aggregate the population on a zip code level, for example. open-source wrapper H3 indexing We aggregate the squares into H3 cells at resolution 11 and store them in a MongoDB with the aggregated values for each demographic group. Using JS streams and MongoDB's aggregation pipelines, the memory usage stays low, and you can process millions of rows on your local machine. For quick data exploration and visualization, you can directly create datasets compatible with or to make beautiful maps. We published an example map for Malta. It is directly visible where the highly populated regions are and where the heart of the city is. Kepler.gl Unfolded.ai By having Facebook's population data now directly queryable, it is much faster to create predictive models or visualizations so data teams can spend time on the value-adding tasks. That is also the main reason why we are building an open-source community for third-party data integration with . So if you want to get your hands on more connectors like these, and Kuwala star us on Github join our Slack community. Previously published at https://medium.com/kuwala-io/querying-the-most-granular-demographics-dataset-62da16b441a8

Polygon

Facebook

Slack

Uber

How to Use Node Streams to Transform the Largest POI Database

Make data integration easier and help us to grow our open source project!

Nominated for 2022 - HackerNoon Contributor of the Year - Data Analysis

Nominated for 2022 - HackerNoon Contributor of the Year - Data Science

Nominated for 2022 - HackerNoon Contributor of the Year - Big Data

Too Long; Didn't Read

Getting Information From The Most Granular Demographics Dataset

Getting Information From The Most Granular Demographics Dataset

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Data Science Feels Like a Fake Entrepreneur in a YouTube Ad

Goldman Sachs, Data Lineage, and Harry Potter Spells

10 Key Skills Every Data Engineer Needs

3 Essential Concepts Data Scientists Should Learn From MLOps Engineers

4 Critical Steps To Build A Large Catalog Of Connectors Remarkably Well

5 Most Important Tips Every Data Analyst Should Know

Data Science Feels Like a Fake Entrepreneur in a YouTube Ad

Goldman Sachs, Data Lineage, and Harry Potter Spells

10 Key Skills Every Data Engineer Needs

3 Essential Concepts Data Scientists Should Learn From MLOps Engineers

4 Critical Steps To Build A Large Catalog Of Connectors Remarkably Well

5 Most Important Tips Every Data Analyst Should Know

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps