We’ve been asked if Airbyte was being built on top of Singer. Even though we loved the initial mission they had, that won’t be the case. Aibyte's data protocol will be compatible with Singer’s, so that you can easily integrate and use Singer’s taps, but our protocol will differ in many ways from theirs.
Let’s first go over the reasons why we don’t build on top of Singer, in contrast with other open-source projects (such as Meltano), and then let’s see how different Airbyte is.
A little history on Singer.io. It was the first open-source project with the mission to address the data integration problem. It was introduced by the company StitchData (which was acquired by Talend in 2018) as a way to offer extendibility to the connectors they had pre-built. Your company could build their own taps (source connectors). Singer now counts about 150-200 connectors, on par with the closed-source Fivetran.
So what is the issue with Singer? Several things:
1. Absence of standardization
There is an absence of standardization and enforcement of protocol. Developers just add whatever they want in their implementation and messages. Contributors only address their own use cases and needs, and don’t build the connector with the mindset to address most use cases that the community might need. So, you never know the quality of a tap or target until you have actually used it. There is no guarantee whatsoever about what you’ll get.
2. No real ownership
There is no real ownership or direction for the project anymore. Indeed, StitchData, over the years, became less and less involved with maintaining the open-source project. And the difficulty with data integration is that applications and APIs change schemas every few months. So a lot of the connectors became outdated, as they were not maintained anymore. In fact, it is not unusual to see connectors with years old PRs that aren’t merged.
In the end, you have a set of connectors with varying quality. In general, the more used a connector is, the more maintained it is. So there is still some value in being compatible with Singer, but building on top of them and being limited by them would not be smart.
1. Airbyte’s connectors are not standalone binaries
Singer’s connectors are standalone binaries: you still need to build everything around them to use them.
With Airbyte, we want to help take them to the next level with a platform that can orchestrate and make them usable out of the box (through our UI or API).
2. One platform, one project with standards
In contrast to Singer, Airbyte has one single repo for the whole project and all the connectors. This will help consolidate the developments, and it will unify the community behind one single project and one vision. This means that the project and community can be opinionated about what a connector should be, and how it should be built.
3. Connectors can be built in the language of your choice
Airbyte runs connectors as Docker containers. So connectors can be built in whichever language you want. The overall platform is built using Java, but any team can build their own connector in Python, Go, Javascript, etc. The easier we make contributions to potential contributors, the more active the community will be in building and maintaining connectors.
4. Decoupling of Extract-Load from Transformation
A normalization stands for an opinionated view of how one should use the data. By separating extract-load and transformation, Airbyte enables:
Engineers, who want to transform the data themselves with their processes, to do that.Engineers / Analysts / Data scientists / Teams, who want to use the normalized data right out of the box, if the normalization is in line with how they want to use the data.
It also enables Airbyte to more easily cover the long tail of integrations. Some connectors might not have some normalization, and that can come separately. The community will also be able to contribute their own normalization, so data users can choose which one suits them best.
5. A UI and API to address every teams’ needs
Airbyte was built on the premise that a user, whatever their background, should be able to move data in 2 minutes. To do that, we needed to build the UI and also an API. Stitchdata does bring the UI, but it doesn’t come with Singer. Here, Airbyte’s UI and API are both open sourced.
Check out our tutorial on how to move data between Postgres DB in just a few minutes.
6. A full commitment to the open-source MIT project with no premium connectors
Singer was born after the company StitchData as a way to expand the number of connectors, thanks to the community. StitchData was just selling their product along with it, and was keeping some connectors away from the community as an upgrade lever.
Airbyte’s core product is the open-source connectors, and the team is fully committed to expanding and maintaining the connectors. Our business model doesn’t depend on any premium connectors. Our vision is to become the open-source standard for data integrations, and then to build Enterprise-targeted features (security and privacy compliance, user and role management, support, SLAs, etc.).
If you’re using some Singer taps, we’ve got you covered. Our data protocol is not built upon Singer’s, but it is compatible with it.
This means we will be able to support Singer’s taps (we will be very selective and focus only on the highest quality and well-maintained ones), and that you can add your own on Airbyte.
Note that we will keep very high standards of quality for our connectors, though. So we don’t guarantee we will put forward / support Singer’s low-quality connectors.
We hope this article clarifies how Airbyte is different from Singer (or Meltano, which is built on top of Singer). Our ambitions go beyond what Singer’s protocol can offer.
Previously published at https://airbyte.io/articles/data-engineering-thoughts/airbyte-vs-singer-why-airbyte-is-not-built-on-top-of-singer/