When Tom and I left Asana to build Pod, one of our biggest decisions was which cloud we wanted to build on. It’s a decision with huge path-dependency; changing infrastructure is tremendously costly, and each stack has quirks and limits that you’ll only discover after you’re locked in. In many ways, it’s a marriage. We decided on a GAE marriage — putting the entire system on the Google Cloud.
We had several reasons for doing this:
- We aren’t infrastructure engineers (and don’t want to be). We wanted PaaS (Platform as a service) more than IaaS (Infrastructure as a service); essentially, we wanted to interact with a simple API — without needing to think about load balancers or instances.
- The GAE Python docs are incredibly sexy. They make you want to build something!
- Google gave us a $300 credit.
In many ways, we felt like we were choosing the underdog. Sure, Snapchat uses Google, but we didn’t know many others. At Asana, we were on a mostly-AWS stack. AWS is powerful, but it can also be hands-on: frequently updating EC2 instances with security patches, seeing remote jobs back up, and witnessing the sharding of the Asana database left deep impressions on me. Even Elastic Beanstalk can be manual. We needed something simpler for Pod but didn’t want to sacrifice power and scalability.
Months later, with the beta version of Pod released, we are happy with our GAE decision. There are a few challenges, but most of the “quirks” we have experienced have been positive.
Here’s what we like about GAE:
- Our web server configuration fits on less than 100 lines of YAML. Instance classes, app versions, routing, and even environmental variables are managed in this one simple file. It’s transparent and checked into Git.
- The datastore had a learning curve, but now we love it. Coming from a relational database background, it was hard to grok Cloud Datastore at first. It comes with limits — most notably, the inability to do JOINs, and lack of strong consistency for many reads. You are accepting this straightjacket as a trade-off in return for scalability and speed.
- Error reporting is seamless. Error reporting connects to consolidated application logs from all your instances, so you can immediately see full traces for any exception, as well as where and how often the issue is occurring. The system is optimized for reporting errors and not for triaging them, but there is an option to connect issues to GitHub or other outside services.
- Remote jobs (called Tasks) are seamless. Throughput, retries, and routing of your tasks are managed in another simple YAML file. Seeing queues is easy in the web console.
- GAE simplifies security. We don’t even know what OS our web servers are running — all we know is that Google has it covered, and they have hundreds of security engineers watching for malware. This narrows the set of security risks we need to worry about. GAE even provides a security scanner that probes for XSS and outdated packages on our web properties.
- Access to all Google APIs is unified. Since Pod is a calendar that connects to several Google services, we make heavy use of Google APIs. Managing quotas and access all in one place is convenient.
The cost of GAE scales with the amount of computing your app is doing —completely shutting down instances when you aren’t using them. So far, the web servers have been the major component of our cost, even though we do a lot of writing to Datastore, BigQuery, and Cloud Storage.
It wouldn’t be a marriage without a few problems, and we’ve definitely had moments of angst with Google. Google uses an internal issue tracker for managing bugs and feature requests, and certain teams do a very good job of triaging, whereas other teams seem to pay little attention.
Another challenging spot has been BigQuery. Similar to Redshift, it’s difficult to avoid duplicate data due to the fact that BigQuery is append-only and does not have support for primary keys. This can be addressed by app design, however. Another issue with BigQuery has been reliability and lack of communication by Google. We’ve found that the status page can be completely wrong — sometimes for hours. When BigQuery goes down, we usually end up contacting friends at Google to see if they know about it.
Finally, we can’t figure out any way that Google’s stack supports substring search without a hack on our part. If you want to search for partial strings (e.g., a “Justi” to find “Justin” in a user typeahead), you’re simply out of luck — the Search API doesn’t do that (nor does NDB).
Overall, however, we are extremely happy we got into bed with Google. Compared with AWS, Google feels simpler, value-added, and, more than anything, accessible. It’s almost as if Google has experience building cloud applications! All the hard technical issues around storing and moving data have been solved; by sharing these solutions, they enable developers like us to stop thinking about infrastructure and instead focus on building great products.
And that is a lot more fun.