Newly-available data, visualized!
The San Francisco County Transportation Authority just released some unprecedented data summaries of Uber and Lyft usage in the City by the Bay. Combined, the two rideshare services logged more than 200,000 daily rides (!!) on typical Fridays in the fall of 2016 — and that’s just counting trips entirely within the city limits.
I was working with the Transportation Authority on some other data visualization tasks when they asked if the platform we were building could also be used to help explore this new dataset. This is what we came up with.
In the public sector, rideshare services are called Transportation Network Companies, or “TNCs”. The official report and the data itself are on the SFCTA website…
Or click here to go directly to the SFCTA “TNCs Today” Data Explorer.
There is a lot to see:
Fridays have the most daily trips on average. You can easily see the commute “humps” during the AM and PM rush hours — when traffic is already at its worst. You can also see a lot of evening and late-night trips, which aren’t as prevalent mid-week.
The data is an estimate of pickups and dropoffs by TNC drivers. The researchers analyzed the lat/long coordinate and timestamp of each driver when they announce, “I’ve accepted a trip” or “I’m now available.” This isn’t exactly the same as passenger origins and destinations, but in terms of TNC impacts on roadway traffic and congestion, it’s probably sufficient to know when and where the driver’s vehicle has started and ended moving. And as a cyclist, I don’t much care whether the car blocking the Valencia Street bike lane is picking up or dropping off passengers: same difference.
Caveats: the dataset represents the average of several weeks of data collection during fall 2016, summarized into one-hour buckets by day of week. Only trips with both ends inside the San Francisco city limits are captured; thus this is likely a low estimate of the total vehicle trips by Uber and Lyft in the city.
Trips in Ubers and Lyfts go up and up as the week progresses.
Lots of late-night trips to and from the Castro on Friday nights. 🍸🍸
There are some weird things, too. Some areas have crazy spikes at just one particular time of day, while others seem to have a lot of TNC trips even though there is nothing paticularly unique about those locations.
Maybe one of you can come up with some explanations for these hotspots; or perhaps they are just noise in the dataset.
Well, the SF Transportation Authority is quick to point out: no policy judgments about this data are being made at this time. The data is now simply “out there”.
We had already settled on a fully open-source stack of technologies as a base for the agency’s upcoming data visualization efforts. These were more than just “free tools” — the combination of these components resulted in something far more flexible, and just as powerful, as any off-the-shelf product I could have envisioned.
Back end. The database is PostgreSQL with PostGIS spatial extensions. For this project, only block-level summaries (called “traffic analysis zones”) were provided, so we weren’t dealing with any sort of “Big Data” here. Any database would have sufficed for storage, but the PostGIS extension allows us to do cool stuff like geocoding, spatial buffers, paths, and offsets. PostGIS is awesome.
The front-end needed to talk to the database somehow; for this we chose PostgREST for its dead-simple RESTful API and easy configuration. PostgREST sits behind the agency’s NGINX reverse-proxy, which provides security and flexibility for assigning URL endpoints.
Front end. We leveraged as many pre-built libraries as possible to get this thing up and running quickly. Furthermore, San Francisco wanted a modern web tool that would be easy for staff to maintain, so I couldn’t pick any esoteric niche products. After experimenting with some alternatives we settled on:
After testing the first few iterations, I started worrying that we’d be getting more attention from the interwebs than we had originally envisioned for this internal tool. To avoid Hacker News overloading our database server we refactored things to load a static zipfile containing the main dataset, instead of fetching a giant GeoJSON query from the database for every page hit. We’ll see how this approach holds up over the coming days.
Ultimately, this whole thing ended up being less than 1,000 lines of JavaScript code. Woohoo!
San Francisco expects to continue research on the effects TNCs are having in the city, and has plenty of other transportation data as well. We’ll be building a full data portal which explores several different datasets in the months ahead.
In the meanwhile, have fun playing around with the tool!
None of this would have been possible without the vision and financial support of staff and leadership at the San Francisco County Transportation Authority — which means you, if you pay sales taxes in San Francisco! Thank you for your generous support.
I stand on the shoulders of the giants who’ve produced this enormous ecosystem of fantastic open-source tools, free for the taking. All of the code I’ve written is available on GitHub — I hope you find ways to remix it and use it in your city, too.
Hi! I’m Billy Charlton, founder of Because LLC and former Director of Data at the Puget Sound Regional Council in Seattle. I mostly hang around in the transportation planning field, since it has such a direct impact on our cities and on our daily lives. I’m currently living in San Francisco. I can do data visualizations for your city, too!
My contact info is on my site and I’m on GitHub and Twitter too.
Cheers!