There are quite a few decks already available online, and we found them all useful, each in different ways. That’s why we thought the deck we used for our seed round could be helpful to some companies, especially those that are open source or developer tools.
For context: on that seed deck, we started working on our open-source data integration platform, airbyte.io, in the end of July, and raised our seed round with Accel in December, so only 5 months in. We’re assuming the deck was acceptable, as we raised the seed after only 13 days in the fundraising process.
We will write a dedicated article about the process within a few days, so this article is really about the deck itself.
Here’s the Slideshare link.
Our deck was pretty standard, though some might consider it a bit short. That’s why we always sent the appendixes along with the deck after our meetings.
Here are more details on our slides – why we included them and what we were saying during our pitches.
Although you might only spend 10 seconds discussing the cover slide, it is important. It’s about showing that you have identified your positioning.
Thanks to Snowflake’s IPO and Segment’s acquisition, we knew that data infrastructure was hot from an investor point of view. We decided to start with some context on the industry and how we fit into this ecosystem.
What we were saying:
When you look at the data infrastructure industry, there is often a new category that emerges through a commercial product. Once the market matures, an open-source alternative gets created and ends up taking over the category. This behavior is often seen because data infrastructure requires privacy, security and scale, which cloud-based solutions can’t offer as well as open-sourced ones. There are many examples, such as Kafka, Spark, and now DBT. We want to be the open-source solution for data integration.
You might wonder why an open-source approach would also win the format for data integration; sometimes a closed-source cloud-based approach works. This last sentence is a transition to the next slide.
What we were saying:
In June and July, we started reaching out to 250 of Fivetran’s, StitchData’s and Matillion’s customers. We ultimately managed to talk to 45 of them. We wanted to know whether an open-source approach would make sense to address data integration. What we learned is that a cloud-based closed-source solution will never be able to fully address the data integration problem. It has several inherent issues.
100% of the companies we talked to were using Fivetran, StitchData or other solutions, while also building and maintaining their own connectors. They did so because either (a) the ETL solution didn’t support the connector they wanted, or (b) the solution supported it, but not in the way they needed.
When you look at Fivetran, for instance, you’ll see that after 8 years, they only support 150 connectors. The hard part about ETL/ELT is not about building the connectors, but maintaining them. It is costly, and any cloud-based closed-source solution will be restricted by a ROI (return on investment) consideration. It isn’t profitable for them to support the long tail of connectors, so they only focus on the most popular integrations.
During those 45 interactions, we also identified a third issue. Some of the companies were about to stop using Fivetran for some connectors because it started to become too pricey. The value of an ELT solution is about replacing a paid data engineer that builds and maintains a connector in-house. The amount of work required from an engineer is almost the same whether a low volume or a high volume of data is being moved. So with volume-based pricing, at some point it just stops making sense to use an external solution.
And the last inherent issue with a cloud-based approach: although cloud data warehouses are winning the enterprise market, it is because they are considered part of the data infrastructure. All other solutions must go through a rigorous privacy compliance process that will take several months.
What we were saying:
We’re building an open-source ELT platform that syncs data from SaaS apps, APIs and databases to data warehouses, data lakes and other databases. Our solution can fully integrate with your data infrastructure and stack, if you are using Kubernetes or Airflow for orchestration, or DBT for transformation. Our goal is to become the open-source standard for anything ELT by the end of 2021.
What we were saying:
By making it trivial to build and maintain connectors using Airbyte rather than doing it in-house, we will become the new standard for building connectors. This will help us support the long tail of connectors. And since connectors run as Docker containers, you can build them in the language of your choice.
As connectors are open-sourced, any team can edit a pre-built connector and tune it to their needs. If a connector breaks, anyone can jump on the code, submit a PR and, once approved, the change can be propagated across all the existing users of that connector.
Airbyte focuses first on a self-hosted offer; pricing will be based on the feature and number of connectors used. The pricing will not be indexed on the volume of data. Being open sourced enables us to have a bottom-up approach with a frictionless adoption from data teams, without going through privacy compliance, as we handle data security as a first-class citizen.
What we were saying:
With Airbyte, we wanted to address 2 different audiences:
Data consumers, including data analysts and scientists.Data engineers who were building and maintaining the connectors themselves, or managing the data infrastructure.
What made Fivetran successful is that they enabled data consumers to leverage the data without the help of data engineers – they made them autonomous (as much as Segment made product teams autonomous). Airbyte provides a UI that makes it very easy to start replicating data for non-technical users. It takes literally 2 minutes for a data analyst to replicate data from Salesforce to Snowflake, including the deployment through our Docker Compose.
In addition, Airbyte will offer data integration connectors through an API that data engineers can leverage to build their own workflow and applications. This is also a way for Airbyte to address SaaS businesses that want to offer their own integrations to their customers through Airbyte.
In this slide, we review the expertise of the team. In our case, we have 4 people coming from Liveramp, where they built and maintained more than 1,000 connectors and were moving more than 100TB of data every day. Our goal is to make investors understand that we’ve already done it in the past, but now the goal is to enable every company to do it.
This is possibly the most important slide, as it validates everything you’ve said before with facts. The number of companies using you is most likely the most important data point for investors.
What we were saying:
We actually started working on Airbyte in the end of July. We soft launched a MVP 2 months later (at the end of September), with only 6 connectors. We wanted to have feedback as early as possible. It’s been 6-7 weeks since we soft launched, and we are now used by X companies, we have Y contributors, and we’ve ramped up the number of connectors to 43.
Now that we’ve shown our velocity up to this point, it’s time to show what we can accomplish in the next few quarters. Investors want to see where you will be when it is time to raise the next round. This slide’s goal is to accomplish this.
This is the last slide of the pitch. The goal here is that you shouldn’t need to say much! It shouldn’t come as a surprise, and is a direct consequence of the roadmap slide.
You need such a slide to separate the main deck from the appendix. No surprise here! We only showed the appendix slides if asked relevant questions.
The point for us here was to insist that we hadn’t done a hard launch yet, only a soft launch on our social networks on 09/24. The rest of the growth is organic.
Investors will always ask for the names of companies that are using you. So you should always have such a slide ready. The more recognizable the names of the companies are, the better.
What we were saying:
Here are a few choices we made that distinguish Airbyte from other open-source solutions:
Airbyte’s connectors are usable out of the box through a UI and an API, with monitoring, scheduling and orchestration. Airbyte runs connectors as Docker containers, so they can be built in the language of your choice.Airbyte’s components are modular, and you can decide to use subsets of the features to better fit in with your data infrastructure (e.g., orchestration with Airflow or K8s or Airbyte’s…)We intend to integrate with DBT for the transformation piece soon and let the community contribute normalization schemas for all connectors. Unlike Singer, Airbyte uses one single open-source repo to standardize and consolidate all developments from the community, leading to higher quality connectors. We built a compatibility layer with Singer so that Singer taps can run within Airbyte.
This is probably the slide we showed the least often, as serious VCs will already know who your main investors are. But you should still have it ready just in case.
That’s it, folks!
Also published here.