Newly-available data, visualized!
The San Francisco County Transportation Authority just released some unprecedented data summaries of Uber and Lyft usage in the City by the Bay. Combined, the two rideshare services logged more than 200,000 daily rides (!!) on typical Fridays in the fall of 2016 — and that’s just counting trips entirely within the city limits.
I was working with the Transportation Authority on some other data visualization tasks when they asked if the platform we were building could also be used to help explore this new dataset. This is what we came up with.
The “TNCs Today” data explorer
In the public sector, rideshare services are called Transportation Network Companies, or “TNCs”. The official report and the data itself are on the SFCTA website…
There is a lot to see:
- You can select views by day of week, and you can explore either all-day totals or focus on trips during a specific hour.
- Clicking on any block on the map will pop up a daily graph of pickups and dropoffs for that area. You can then switch to different days of the week to see how Mondays differ from Fridays, for example.
- Try out the 2D and 3D views: 3D really shows the striking patterns of TNC activity in the city across different days of the week, while the 2D view makes it a bit easier to click and explore individual locations.
The data is an estimate of pickups and dropoffs by TNC drivers. The researchers analyzed the lat/long coordinate and timestamp of each driver when they announce, “I’ve accepted a trip” or “I’m now available.” This isn’t exactly the same as passenger origins and destinations, but in terms of TNC impacts on roadway traffic and congestion, it’s probably sufficient to know when and where the driver’s vehicle has started and ended moving. And as a cyclist, I don’t much care whether the car blocking the Valencia Street bike lane is picking up or dropping off passengers: same difference.
Caveats: the dataset represents the average of several weeks of data collection during fall 2016, summarized into one-hour buckets by day of week. Only trips with both ends inside the San Francisco city limits are captured; thus this is likely a low estimate of the total vehicle trips by Uber and Lyft in the city.
Notable nuggets in the data
- Sundays and Mondays have the lowest number of TNC trips, with progressively more trips on Tuesdays, Wednesdays and Thursdays, and the most trips on Fridays and Saturdays
- Weekdays have a predictable commute pattern with two peaks in the AM and PM rush. Fridays and Saturdays have much more evening travel than other days do, extending very late into the nighttime
- Uber and Lyft trips are far more frequent in the northeast quadrant of the city, basically north of Cesar Chavez and east of Divisadero, on all days and at all times of day
- Notable tourist attractions such as Fisherman’s Wharf, the Golden Gate Bridge, and GG Park museums are easily visible, and have very different time-of-day distributions than downtown
- Weekend hotspots show up on Friday and Saturday nights: the Castro, Mission/Valencia, North Beach, the ballpark, and many others
There are some weird things, too. Some areas have crazy spikes at just one particular time of day, while others seem to have a lot of TNC trips even though there is nothing paticularly unique about those locations.
Maybe one of you can come up with some explanations for these hotspots; or perhaps they are just noise in the dataset.
What does it all mean?
Well, the SF Transportation Authority is quick to point out: no policy judgments about this data are being made at this time. The data is now simply “out there”.
Leveraging open source in the public sector
We had already settled on a fully open-source stack of technologies as a base for the agency’s upcoming data visualization efforts. These were more than just “free tools” — the combination of these components resulted in something far more flexible, and just as powerful, as any off-the-shelf product I could have envisioned.
Back end. The database is PostgreSQL with PostGIS spatial extensions. For this project, only block-level summaries (called “traffic analysis zones”) were provided, so we weren’t dealing with any sort of “Big Data” here. Any database would have sufficed for storage, but the PostGIS extension allows us to do cool stuff like geocoding, spatial buffers, paths, and offsets. PostGIS is awesome.
The front-end needed to talk to the database somehow; for this we chose PostgREST for its dead-simple RESTful API and easy configuration. PostgREST sits behind the agency’s NGINX reverse-proxy, which provides security and flexibility for assigning URL endpoints.
Front end. We leveraged as many pre-built libraries as possible to get this thing up and running quickly. Furthermore, San Francisco wanted a modern web tool that would be easy for staff to maintain, so I couldn’t pick any esoteric niche products. After experimenting with some alternatives we settled on:
- GitHub Pages for serving the static site. All the code was on GitHub already, so it just made sense to keep using them for hosting. A static site means no servers to maintain or get hacked. If I could just build static sites for everything from now on, I would. Bummer it’s so hard to get SSL support on custom domains, though.
- Vue.js for templating and reactive elements; what a pleasure learning and using this framework.
- Mapbox GL JS for the interactive 2D/3D map. I originally wanted to stick to fully open-source Leaflet, but we really liked the 3D capabilities in Mapbox. If our data hadn’t been so pretty in 3D, we would have used Leaflet.
- Interactive charts using Morris.js
- Semantic UI for the shiny pretty buttons
After testing the first few iterations, I started worrying that we’d be getting more attention from the interwebs than we had originally envisioned for this internal tool. To avoid Hacker News overloading our database server we refactored things to load a static zipfile containing the main dataset, instead of fetching a giant GeoJSON query from the database for every page hit. We’ll see how this approach holds up over the coming days.
San Francisco expects to continue research on the effects TNCs are having in the city, and has plenty of other transportation data as well. We’ll be building a full data portal which explores several different datasets in the months ahead.
In the meanwhile, have fun playing around with the tool!
None of this would have been possible without the vision and financial support of staff and leadership at the San Francisco County Transportation Authority — which means you, if you pay sales taxes in San Francisco! Thank you for your generous support.
I stand on the shoulders of the giants who’ve produced this enormous ecosystem of fantastic open-source tools, free for the taking. All of the code I’ve written is available on GitHub — I hope you find ways to remix it and use it in your city, too.
Hi! I’m Billy Charlton, founder of Because LLC and former Director of Data at the Puget Sound Regional Council in Seattle. I mostly hang around in the transportation planning field, since it has such a direct impact on our cities and on our daily lives. I’m currently living in San Francisco. I can do data visualizations for your city, too!