The art of building a large catalog of connectors is thinking in onion layers. We’re building an open-source data integration platform at . We launched our MVP about a month ago. We were thrilled by the amount of feedback and support we got from the community. We even got our first big pull request from a contributor this week (2,000+ lines of code). But during this full month, we didn’t release any new connectors. You might wonder why we didn’t build on that momentum. If people were excited with our MVP even though it had only 6 connectors, you might think we should have ramped up on the number of connectors as fast as possible. We didn’t do that for two very important and differentiating reasons. Airbyte First, we were defining exactly what the best data protocol would be if we wanted to solve data integration once and for all, and this for companies. You can learn more about our specification . Even though it’s not final yet, you will have a glimpse of our vision for the future. all here Second, and just as important, we were building a real manufacturing plant for data integration connectors. See, our team led data integration at Liveramp, which has more than 1,000 data ingestion connectors and 1,000+ distribution connectors. So we have the experience of abstracting what can be abstracted and simplifying the manufacturing of new integration (very often without code). We haven’t fully built our manufacturing plant, but engineers can already add one new connector every day. This article describes how we built this connector manufacturing plant. What you need to think about when building a large number of connectors When building a large catalog of connectors, there are several things that you need to think through. Initial build This is when you start from a blank page. This step usually requires a little bit of planning since it involves communication with external teams/companies. The initial build step involves: Access to the source/destination documentation Access to test accounts, test infrastructure, etc. Using golden path encoding good practices Using the best language for the task: today, we support both Java and Python, but anyone can add their own languageC reating documentation Defining the necessary inputs Tests Tests are essential to make sure that any code or protocol change won’t affect the connectors. They need to run before every merge. They also ensure that the connector behaves as you expect. For that you need to run your connector against the actual production service. For example, if you’re working on the Salesforce connector, you must make sure that Salesforce actually behaves the way you expect. It is not unusual that an API or service documentation doesn’t fully reflect the reality. We currently have the foundation of our test framework; it allows developers to focus solely on providing inputs and outputs, and the rest is taken care of by the framework. These tests give us 90% certainty that the connector is fully functional. If there are edge cases, it is always possible to add more custom tests. Liveliness & Change detection It is essential to ensure that the source or destination continues to behave as it was encoded  during the initial build phase and to ensure that the source or destination is still alive for monitoring purposes. These verifications must be run at a cadence, and any failure needs to be investigated and fixed, leading to the maintenance phase. Maintenance We need to define how we are going to update the connector, push changes and propagate the changes to all the running instances of Airbyte. The art of building connectors is thinking in onion layers Segmenting cattle code To make a parallel with the pet/cattle concept that is well known in DevOps/Infrastructure, a connector is cattle code, and you want to spend as little time on it as possible. Anything you can do to prevent yourself from doing work in the future, you need to do. This will accelerate your production tremendously. Abstractions as onion layers Maximizing high-leverage work leads you to build your architecture with an onion-esque structure: The center defines the lowest level of the API. Implementing a connector at that level requires a lot of engineering time. But, it is your escape hatch for very complex connectors where you need a lot of control. Then, you build new layers of abstraction that help tackle families of connectors very quickly. Today, we’ve built one of these abstractions to support existing Singer integration. Building an integration leveraging Singer takes us less than 3 hours, and our goal is to bring it down to less than 10 minutes. We have the same ambition for every other family of sources and destinations. As we continue to improve our manufacturing plant for connectors, we will build tools that will allow us to handle 95% of integrations with no or very little code. This is how we are going to address the long tail of integrations and how we’re going to make integrations a commodity. What Airbyte has built up to now We’ve built the following: The center of the onion The golden path in Java & Python to build new connectors The first version of the integration test framework Connectors:10 sources with a rate of 1 new source per day, and 4 destinations A layer to quickly support Singer integrations What our ambitions are with this connector manufacturing plant We want to reach a rate of 5 connectors per day and accelerate even beyond that. We also want to provide the community with more tools to build and contribute their own connectors. Ideally, 95% of connectors can be added to Airbyte with no code. We hope this gives you a better understanding of what we’ve been up to and what our real ambitions are. If you see any ways to improve this architecture, we’re all ears. Don’t hesitate to join our to discuss any questions or suggestions with the team. Slack Previously published at https://airbyte.io/articles/data-engineering-thoughts/how-to-build-thousands-of-connectors/

Salesforce

Slack

A Guide on The Future of ETL: EL(T) not ELT

Open-Source Data Integration and ETL in 2020

4 Critical Steps To Build A Large Catalog Of Connectors Remarkably Well

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

6 Biggest Differences Between Airbyte And Singer

The Noonification: Feature Optimization for Price Prediction (11/26/2023)

10 Ways to Optimize Your Database

10 Essential Computer Skills for Data Mining

10 Most Evolving Big Data Technologies to Catch Up on in 2022

Top 10 JavaScript Charting Libraries for Every Data Visualization Need

6 Biggest Differences Between Airbyte And Singer

The Noonification: Feature Optimization for Price Prediction (11/26/2023)

10 Ways to Optimize Your Database

10 Essential Computer Skills for Data Mining

10 Most Evolving Big Data Technologies to Catch Up on in 2022

Top 10 JavaScript Charting Libraries for Every Data Visualization Need

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps