It’s 3 AM. My alarm goes off and I groggily climb out of bed and crack open my laptop. One of our biggest customers needs their data delivered by 9 AM, and I’m getting up before sunrise to triple-check every data point before their delivery. Our data platform was built with hundreds of data audits, but this customer’s delivery was just too complex to feel 100% confident that we’ve captured all potential issues. This scenario would soon become a typical morning for me. Wake up. Coffee. Pray to the data gods for an inbox without 500 Zendesk ticket escalations.
My name is Tim Liu. I quite recently joined Hull as head of data integration, but I’ve been working in the data management space my whole career. The story above happened years ago in a different industry, with a different team, and during a time where I knew very little about the hidden complexities of data integration. Since then, I’ve been beaten up a lot, but not without learning a lot of lessons about the nature of data integration.
We all subscribe to the mantras “Data is a company’s most important asset” and of course, “You can’t manage what you can’t measure”, but beyond the ideal state for many companies lies a vast wasteland of expensive, failed data projects. That’s because data integration is the foundation of most data analytics and ops projects, but it’s also undoubtedly the trickiest part to get right.
Data integration issues will kill your project before you start to see any value, but the good news is ─ it’s not all doom and gloom. There is a path to success, but it’s a path less traveled, and even less talked about.
I’m here to tell you about what happened when the data projects I worked on failed, why they failed, and what I learned so that you don’t run into the same landmines as I did.
I’ve managed enough data integration projects to realize that identity management is at the center of many common data problems. At its core, it deals with the major entities in your system that you’re trying to analyze and bend to your will. Let’s take the customer data space for example. What defines a Person in your system? What defines a Company? Poor identity management is the cause of expensive deduplication cleanups and manual intervention. Knowing what identifies the entities in your system is of paramount importance, and I’ve come to the realization that this is one of the first things your data team will need to decide on upfront. Changing your identity strategy in the middle will inevitably lead to an explosion of duplicates, bad relationships, and a manual cleanup effort.
If identity management is the first spinning plate, then the relationships between those identified entities is the second. For example, you’ve got the People in your marketing system, but you’ve also got the Companies to which they’re related. Relationships between entities are even harder to maintain sometimes because they rely on a robust identity management strategy. This is why it’s important, especially for maintaining the correct relationships, that you have a leading system.
What is a leading system? It’s a single system that’s the arbiter for a particular attribute or relationship. Especially when it comes to a Person-to-Company relationship, you want to make sure that you’re creating that relationship in one place. Otherwise, you’re in for a world of data loops where individuals are hopping between companies that look similar: AmazonInc.fr vs Amazon.us. Ideally, a leading system should be easily accessible by your data admin, in case there’s a scenario where you have to manually intervene to make the correct association.
Okay, so Reason #3 isn’t something I personally did, but something I heard enough from customers and prospects that I thought it worthwhile to mention.
In the pre-sales process, I had many conversations with prospects who ended up talking themselves into building the integrations themselves. I always had the same response: Godspeed and good luck! The number of services and the nuances in each application makes this problem ridiculously hard to solve even for the experts. Even if you’re able to secure engineering time to build the integrations for the handful of applications that you have, you can’t forget about the time it takes to tweak and maintain the solution. Oh, and did I mention bugs? Yeah, it’s not like those are going to happen, right?
The truth is: Yes, there are simple scenarios where it may make sense to complete the project internally. But usually anything that’s more complex takes a lot more work.
In some data integration projects, this may not be a problem at all, yet in others, it may be the only problem. A LOT of customers have this fear of losing their legacy data. “But the insights!” they’ll say. First, you should check yourself for a minute and determine whether or not you’re a data hoarder. Many times, the juice just isn’t worth the squeeze. The likelihood of legacy data telling you something useful in the future may be slim. Now consider the time and cost of integrating the legacy data pipeline with the new one.
Many times, especially in projects where new data accumulates fairly quickly, it’s easier just to develop a strategy going forward. With customers who insisted on integrating legacy data sets so they could have several months of history, I would usually tell them that the project to clean and integrate their data would be hard, but we could certainly do it in a few months time. If you need 6 months of clean, pristine history, my general wisdom would be to recommend ensuring your existing data strategy is solid, and then collect 6 months of data from there instead of embarking on a costly data cleanup project.
But in the end, it all depends. At Hull, we have had customers who wanted to bring over legacy cookie data. We ended up keeping that intact for them so that they could differentiate new web visitors from returning visitors. My advice would be to look hard at your legacy data set, save what you absolutely need to, and then Marie Kondo the rest of it. If you must, you can always save a backup of your data somewhere inexpensive to satisfy the hoarder inside yourself!
Me on day 1 of my first data integration project: Let’s do this thing! Alright, we’re going to pull data from Intercom, then we’ll cross reference it with the product data in our database, then marry it with our marketing campaigns, and maybe personalize the landing pages based on how far along the prospect is in their customer journey…
Me on day 37: Soooo…that was a little ambitious.
Since then, I’ve learned to start with some smaller, easy wins. At Hull, my recommendation for companies integrating customer data into a customer data platform would be to start by identifying a clear use case that will bring your team value once implemented. For your first use case, keep things simple. The fewer the implementation points, the better.
If you don’t know what that initial use case is, that’s okay. It’ll take some time to figure out what makes the most sense for your business.
Disclaimer: Your new database probably won’t end up as your production system because it’s now probably a big pile of disorganized data sets. But it’s your starting sandbox for exploration and discovery.
I could elaborate on each reason above in their own small novel. I probably haven’t seen it all...but I’ve seen a lot. I’ve sweated and bled for projects that were doomed from the start, but I’ve also been surprised at the projects that overcame tremendous odds to bring real value to our customers. Beyond any particular list of potential issues, as long as you understand that data integration is a hard problem and have a data partner you can trust, you should be able to find the balance between the hype and the reality on the ground.
About the Author
Tim Liu is the Director of Integration at Hull. When he's not living and breathing data integration, he loves spending time with his wife and three kids, trying new restaurants, and getting the best deal on live lobsters.
About Hull
It’s never been easier to use data and technology to find, acquire and care for customers. But that data so often sits siloed within tools and teams. Hull solves this problem by collecting, enriching and synchronizing data without any code so that you can orchestrate personal, relevant experiences at every touch using your existing tools. The result is a seamless experience for your customers as well as your sales, marketing, and support teams. For more information, visit www.hull.io.