Mapbox and Tippecanoe for big census data\n-----------------------------------------\n\n!(https://hackernoon.com/hn-images/1*Af7dqRBswAKRF68uaNM6Ow.png)\n\n> **Check out the finished** [**map here**](http://ryantm.io/population/)**!**\n\n[Housing](https://hackernoon.com/tagged/housing) policy is something I deal with a lot, and so I spend a lot of time trying to make sense of housing data. While thinking about the relationship of the rental housing [market](https://hackernoon.com/tagged/market) with home ownership (typically represented across time), I started to wonder what that relationship looks like geographically.\n\nCertainly there are parts of cities known for having lots of condos, or apartments, or single family homes; but I was curious what this looks like on the whole, and if larger structures could be discerned.\n\nThis inquiry turned into its own formidable technical challenge, and resulted in a pretty interesting data-set; read on to find out more, and how I built it!\n\n!(https://hackernoon.com/hn-images/1*LNL3Ot9PJyTdTQJZNv3UKw.png)\n\nPortland, Oregon renters/owners viewed as a choropleth.\n\n!(https://hackernoon.com/hn-images/1*NRrUP-YrJDoEFHzxFZqcJg.png)\n\nPortland, Oregon renters/owners viewed per-person.\n\nThe US Census makes owner/renter information readily available for census block geometries, but viewing it as a simple choropleth colored polygon leaves something to be desired. Often, one of the prime spatial consequences of rentership is increased density, since rented homes are often smaller or located inside multi-family buildings. But a choropleth map showing the relative incidence of owners to renters looses this important information. If instead, we view the data at a per-person level we can capture both dimensions of information at once.\n\nThis lets us compare structures of both density and ownership in very dense locations like Manhattan:\n\n!(https://hackernoon.com/hn-images/1*qxijyKrS4aT7xZHdUT79Cw.png)\n\nAnd locations that have a visible spread from urban to suburban, like Washington DC:\n\n!(https://hackernoon.com/hn-images/1*zzxk_4NilJrrxJvwK_E5ew.png)\n\nThis makes it easy to start forming questions about why development took the shape (literally) that it did; what historical forces, policies, and timelines created particular shapes and conglomerations of one type or the other? Or, by the same token, what gave rise to those areas with no discernible structure at all?\n\nI will admit I have no answers yet, but it’s an interesting jumping off point.\n\n### How I built it\n\nThe data used originally comes from the US Census Bureau's SF1 2010 Census. This is the most recent census with data available at the block level, and that fine grain of detail is necessary to produce more interesting visualizations. Using the [US Census API](https://www.census.gov/developers/) and [TIGER/Line](https://www.census.gov/geo/maps-data/data/tiger-line.html) geometry database, you can grab both census variables of interest, and the geometries they are associated with. In our case we took total population, population who rented their home, and population who owned their home.\n\nThe population number has to be converted an equal number of point geometries that fall within the block geometry. There are several ways of doing this, and I happen to have access to far more compute time than development time (and its far cheaper) in my current role. This was also a low priority project, so I coded it up the cheap way (brute force), set it running on a Friday evening, and enjoyed the weekend in wine country. When we came back on Monday there was a nice data-set waiting. :)\n\n!(https://hackernoon.com/hn-images/1*LbOGYOCAO0taCiLZLGP7Jw.jpeg)\n\nOregon wine country. What better way to spend a weekend!\n\nFor every block feature we fed the geometry into a function that first calculates the bounding box, then generates a random point within that bounding box. We test if the point is inside of the actual polygon or just in its bounding box using ray-casting.\n\nIf the point falls inside the polygon we save it, otherwise it gets discarded. We keep doing this until we have the same number of points inside the polygon as the number of people recorded for that block in the SF1 census, for each type we’re interested in (renters/owners), then move on to the next block.\n\n> For what it’s worth, if anyone is interested in doing this in a more reasonable amount of time, the following should be a faster solution: Calculate a [constrained delaunay triangulation](https://en.wikipedia.org/wiki/Delaunay_triangulation) of the polygon, pick a triangle at random, and generate a random point using [barycentric coordinates](https://en.wikipedia.org/wiki/Barycentric_coordinate_system) inside the triangle. This eliminates the rejected points from the brute force method and should provide much faster point generation, especially for heavily skewed geometries; at the expense of more complex code.\n\n#### Display\n\nThe output of the above script ended up being nearly 40GB of geojson files containing point features. The next question naturally was, “How on earth are we going to load this into anything?”\n\nFortunately, vector map tiling is a magical thing! [Mapbox](https://medium.com/@Mapbox) maintains a program for just such situations, called [Tippecanoe](https://github.com/mapbox/tippecanoe). Tippecanoe takes in huge quantities of geojson geometries and converts them to the Mapbox Vector Tiles format, a highly efficient protobuf encoded SQLite database. This lets you serve your data as small digestible vector tiles, and will help to ensure the texture and density of the data is preserved across all zoom levels.\n\nThe resulting vector tile database, called a .mbtiles file, was around 2GB in size and uploaded to Mapbox’s tile servers easily.\n\nAll that was left was to apply some styling in Mapbox Studio. To help the points read well across zoom levels, their diameter is a function of zoom as well as their opacity. That way at low zooms, overlapping points create brighter regions.\n\nYou can check out the finished product [here](http://ryantm.io/population)! Showing demographic information at a person-by-person level can really change your perception of the data, and forces you to remember just why it’s important: we’re talking about real people here!\n\nIf you create visualizations like this that you find interesting, I’d love to see it, [shoot me a tweet](http://twitter.com/mcculloughrt)!