How we spent 30k USD in Firebase in less than 72 hours

#UnaVacaPorDeLaCalle became the largest crowdfunding campaign in Colombia, collecting 3 times more than the previous record so far in only two days! It also became one of the biggest political crowdfunding campaign in history.

A BIG SUCCESS FOR VAKI

Just 48 hours after the campaign was released, we had reached many records. The campaign collected 3 times more than the previous record in Colombia at that time. We had reached more than 2 million sessions, more than 20 million pages visited and received more than 15 thousand supports. This averages to a thousand users active on the site in average and collecting more than 20 supports per minute.

It was a huge success for us, our engineering team was very proud and happy, we were viral and our site was up and running every single second of it. At that moment we were celebrating while watching closely the analytics.

The app was running, all the supporters were able to support and the comments on social networks were that the app made it really simple to do support. We were very proud :)

We didn’t want to release any new feature with that many users on the site, so we decided to merge a version with Angular V.6 and lazzy we were wroking on, it was a good time to to work on that, we said. The site started to load slower, for some users it took them more than 30 seconds to load the page. That was weird. Our team was not comfortable with that and we couldn’t understand what was causing it and now we had our code with a completly new version of Angular, and probably many other bugs in production.

Our team started to rush this new release and refactor almost everything to lazy load everything we can. We did a refactor live. It was a huge risk and we wanted this campaign to be perfect, so we did it! In just a day and a half, our team had the first release ready in the new version. After some tests, it looked like the refactor helped the app’s speed, but it was not as fast as we wanted it. Our goal was to load in 3 seconds and it wasn’t working as we expected. This was our first clue that something within Firebase could be improved.

When we accessed the Firebase Dashboard, we realized that with that many visitors and supports happening our Firestore Service was not just overloaded, but that we had a huge debt with Google, we had spent $30,356.56 USD in just 72 hours!

Our billing dashboard afterwards

A VERY EXPENSIVE CODE MISTAKE

Since the campaign was released, and for the next 48 hours, we had use lot of resources of Firestore, our billing came up to $35,000 USD!!! We did more than 46 BILLION requests to Firestore. Yes, billion with a B.

We had a grant with Google Cloud for $25,000 USD on our account thanks to NXTP Labs acceleration program we were part on 2017, so our debt was “only” $10,000 USD at that moment. The real problem was that every hour costed us $600 USD more on Google Cloud Services. At that moment we had two options: take down the site to stop the billing or start debugging every single line of code with a money clock on the table. We choose the second one.

We didn’t know where we could optimize the requests. We thought our company was not profitable if we use Firebase as our hosting provider and we were getting stressed already about the need to take our data somewhere else. With just few more hours of life left in us, we found the line! It was just one bad line of code that was causing that amount of requests (and costs): this.loadPayments()

To understand this, let me explain how our architecture works. We have two main collections on Firestore: Vakis and Payments. Vakis has the documents with the data for each Vaki and Payments the documents with the data of each user payment.

One of our features is to show the user the total amount of money and supporters that a Vaki has received in real time. So we have the two services that loads this information on the client side, vakis.ts and payments.ts.

Each time a payment is approved, we update a value on the Vaki document with the new totals. So we just need to read the collection Vakis to print that information. But, our huge mistake, was to ignore that and calculate the totals by reading from the collection Payments.

Every time we call the service vakis.ts , on the constructor method was the line this.loadPayments() which called the service payments.ts and with that service we were printing the information of a Vaki. This means, that with every visitor to our site, we needed to call every document of payments in order to see the number of supports of a Vaki, or the total collected. On every page of our app!

This means that every session to our site read the same number of documents as we have of number of payments. #UnaVacaPorDeLaCalle received more than 16,000 supporters, so: 2 million sessions x 16,000 documents = more than 40 Billion requests to Firestore on less than 48 hours.

GOOGLE UNDERSTOOD AND POWER US UP!

After we fixed this code mistake, and stopped the billing, we reached out to Google to let them know the case and to see if we could apply for the next grant they have for startups. We told them that we spent the full 25k grant we had just a few days ago and see the chance to apply for the 100k grant on Google Cloud Services. We contacted the team of Google Developers Latam, to tell them what had just happened. They allowed us to apply for the next grant, which google approved, and after some meetings with them, they let us pay our bill with the grant.

Now we could not be more grateful to Google, not only for having an awesome “Backend As A Service” like Firebase, but also for letting us have 2 million sessions, 60 supports per minute and billions of requests without letting our site go down. Besides they understood errors like ours can happen when a startup is growing and some expensive mistakes can jeopardize the future great companies.

CONCLUSION

It is very important that tech teams debug every request to servers before release. Analyze if the number of requests and data transfer make sense, and if your company will support the costs of the host with a big load of traffic. Otherwise, you will just catch loops, or un-optimum requests with a huge bill or with your site down.

Also, if you are a startup that use any Lean methodology to launch fast, fail fast, learn fast, be careful, the tech agility and tech perfection are not good friends, and you need to find a good balance between them to prevent expensive mistakes, but still move your business as fast as you can.

Thank you Google, thank you Karen, Paco and Martín (Team Google Developers Latam), thank you NXTP Labs. Without you guys, we wouldn’t be able to endure our first viral campaign 👊

Thanks to Juan Pablo Muriel, Katharine Vander Laan, Laura Cardona, Santiago Jaramillo and the Vaki Team for reading and commenting on drafts of this article.

Topics of interest

More Related Stories