paint-brush
Open-source Effect On Build Vs. Buyby@jean-lafleur
1,706 reads
1,706 reads

Open-source Effect On Build Vs. Buy

by John LafleurJune 13th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Airbyte.io is the new standard for open-source data integration for ETL/ELT. The company is using more than 100 software apps and needs a solution to integrate all of the data your apps produce. With Airbyte, you can either just use the open-sourced connectors and start replicating data in minutes for free, or even build new connectors (if ever Airbyte doesn’t support them) in a matter of days (vs. months before) Building your own pipeline by yourself is a significant time commitment. In contrast, an off-the-shelf solution such as Fivetran.

Company Mentioned

Mention Thumbnail
featured image - Open-source Effect On Build Vs. Buy
John Lafleur HackerNoon profile picture

When you’re selling or considering purchasing a B2B tool, you need to understand the build vs. buy argument. What are the pros and cons of building the tool internally vs. buying the tool from a third-party vendor? This is especially true in big companies where you have the resources to build the said tools. Early-stage startups will generally opt for the faster route, going with self-served B2B tools -- unless the pricing is prohibitive. 

But something we don’t often think about is how open-source just messes the whole thing up. The build is completely redefined. You now need to compare the B2B tool with the build without the open-source tool, as well as with the open-source tool, which most often lowers the barrier significantly. 

In this article, we’ll take the example of the ETL/ELT industry. We know it best, as we’re building Airbyte, the open-source ELT alternative. Let’s see how open-source for ETL / ELT with Airbyte is also flipping the previous Build vs. Buy balance on its head. 

We’ve produced an infographic to illustrate that point. You will see that without taking Airbyte into consideration, the build vs. buy was pretty useful with Fivetran, in contrast to building connectors yourself. But now, with Airbyte, you can either just use the open-sourced connectors and start replicating data in minutes for free, or even build new connectors (if ever Airbyte doesn’t support them) in a matter of days (vs. months before) with maintenance being crowdsourced throughout the Airbyte community. 

The Infographic

Here is:

  • in white, the original “build” scenario; 
  • in blue, the original "buy" scenario with cloud-based Fivetran; 
  • in purple, the new "build" scenario with 2 options: “build non-supported connector with Airbyte” in light purple, and “use prebuilt connectors from Airbyte” in dark purple

Let’s just say it: the playing field has changed!

The Explanation

Some context: the average business today uses well over 100 software apps, many of which contain valuable insights about an organization’s operations. Your company is likely on the way to using just as many apps, if not more, and you’ll need a solution to integrate all of the data your apps produce. 

Time & Effort

Building your own pipeline by yourself is a significant time commitment. It can take between 3-6 months to set up a basic pipeline. Furthermore, beyond the time commitment, there is some inherent complexity in building a reliable, high-performance ELT pipeline. You need to: 

  1. Obtain developer access to the data source
  2. Explore the data
  3. Design the schema/data models
  4. Set up a connector framework
  5. Test the connector and validate the data
  6. Set up orchestration, configuration validation, state management, normalization, schema migration, monitoring, etc. 
  7. Maintain the connector for every schema change that happens every few weeks. This part is very cumbersome, as it requires an increasing number of data engineers to manage your connectors. 

In contrast, an off-the-shelf solution such as Fivetran can be set up in a matter of minutes with prebuilt connectors. Airbyte also takes literally 30 seconds to deploy, and you can start replicating data within 2 minutes

The big difference between both options in terms of time and effort is that all the Fivetran customers we talked to also had to build and maintain connectors on the side, as the connectors they needed were either not supported in the way they needed or not supported at all by Fivetran.

That’s where the option to build with Airbyte comes in. For connectors not supported by Airbyte, it is a matter of hours to build connectors. Indeed, Airbyte already took care of having a UI, monitoring, scheduling, orchestration, integration with your data stack, automatic schema changes, etc. There is a very high chance we support your destination. So in the end, it’s only the EL part of the source connector you have to build, and Airbyte is providing some abstractions to make that easier. 

Regarding maintenance, the goal of Airbyte is to crowdsource throughout the community. When a connector fails because of significant API changes, it will notify the connectors’ users. As soon as the fix is made available by the Airbyte team or a community member, Airbyte will propagate the fix to all the users. The hope is that this approach will provide a better SLA than closed-source solutions such as Fivetran, not to mention the fact that you won’t have to maintain the connector yourself. 

People & Money

From what we’ve seen, a typical company requires the equivalent of at least two or three full-time data engineers to build and maintain a data pipeline. The total cost of three full-time engineers can reach the high six figures (including benefits). So that’s a lot!

Fivetran’s fees for a typical mid-sized company with five connectors is about $50,000. But you’ll have to add to that cost all the connectors you need to build and maintain by yourself. 

In contrast, Airbyte’s connectors are open-sourced, so you can use them for free. You also don’t need to pay for the egress to Fivetran’s infrastructure. It is possible that you might need a little bit of engineering time to operate Airbyte. If you need to build some of the connectors yourself, you will have to pay for the time spent by the data engineering team on building and maintaining them, but that would still be way less than if you had to do everything yourself.

Opportunity Costs

The actual value brought by your data team is through analysis and modeling. All the data integration, cleaning and transformation is important, as they enable the analysis and modeling. So the more time your team can spend on value-producing tasks, the better for the business. 

So opportunity costs as depicted in the illustration are very important to consider. Plus, ask any data team -- they will much prefer doing analysis or modeling tasks, rather than pipelining! So you will have better talent retention this way. 

Now you can see how open-source can flip the previous build vs. buy balance on its head. Before Airbyte, Fivetran was an easy sell. Now, it seems the contrary. Leveraging Airbyte’s open-source technology to build your own data infrastructure seems the obvious choice. 

There is one last thing to consider when choosing which direction to take: the future. 

Future Growth of Your Company

As your company grows, you will add data sources to the pool. The complexity and effort of building and maintaining a data pipeline for a huge number of data sources can quickly escalate beyond your data engineering team’s ability to handle it. 

You might consider taking a chance on Fivetran’s ability to cover all or most of your connector needs, so that your team doesn’t need to build and maintain a continually increasing number of connectors (that would defeat the purpose). But, be mindful that Fivetran will always have a ROI consideration to maintaining connectors on the long tail; they won’t maintain connectors that don’t bring enough revenue to offset the maintenance costs. 

On the other hand, Airbyte will continue to grow the number of prebuilt community-maintained connectors, and can even take a large portion of the maintenance costs off your hands. 

When making a decision, consider how your company will evolve. And you can be sure that a great data infrastructure that grows with you will be a competitive advantage. And this way of thinking is something that you should have for any tool.

Previously published at https://airbyte.io/articles/data-engineering-thoughts/how-open-source-can-disrupt-build-vs-buy-considerations/