Data Engineering Hack: Using CDPs for Simplified Data Collection by@mparticle

Data Engineering Hack: Using CDPs for Simplified Data Collection

Customer Data Platforms (CDPs) have far-reaching value for engineers as well. CDPs enable companies to develop ‘customer data infrastructure,” or a foundation upon which data can easily be collected from different sources in real-time and delivered to the systems where it can drive value. Developers drive a company’s goals forward by making the product the very thing around which all other functions in the organization revolve. Their time and expertise are best spent implementing new features and improving existing ones.
mParticle HackerNoon profile picture


One API for customer data: Simplify tracking code, improve performance, and reduce vendor overhead.

facebook social iconlinkedin social icongithub social icontwitter social iconinstagram social icon

Customer Data Platforms (CDPs) have traditionally been considered tools that benefit marketers and product managers. But from simplifying data collection to enabling data-driven feature development, CDPs have far-reaching value for engineers as well. Learn more about the benefits of CDPs for technical teams.

Here’s something everyone knows: data is important. Here’s something not everyone knows: handling a company’s customer data does not have to be a major drain on engineering time and resources.

Over the last decade, personalizing customer experiences has become table stakes for brands, which means that the use cases for data have expanded dramatically. This has had the effect of adding an ever-increasing list of data-related tasks to the mountain work for which engineers are already responsible.

Does marketing want to onboard a notification platform? That’s an engineering ticket. That same platform changes its API specs, and the integration needs to be reconfigured? Call on the developers. Does the product team want audiences for an A/B test based on data from multiple inputs regularly? Better get HR to put out a posting for some more data engineers.

Each of a company’s data requirements––from integrating new tools across every app and website to transforming data into structures that serve specific use cases––demands a precious resource: engineering time. And while the result of this data collection, transformation, and delivery is indeed important, servicing these demands is not the avenue through which engineers can deliver the most value to an organization. Developers drive a company’s goals forward by making the product––the very thing around which all other functions in the organization revolve. Their time and expertise are best spent implementing new features and improving existing ones. If collecting and shipping data ends up draining an inordinate amount of developer calories, something needs to be done.

Customer Data Infrastructure is the Path Out of Data Inefficiency

One of the biggest advances in the ability to move people and materials around the United States was the development of the Interstate Highway System in the second half of the 20th century. Before creating this vast network of interconnected and standardized superroads, traveling around the country could prove very difficult. Longer trips often involved driving on a hodgepodge of local and state routes with varying quality and reliability. Personal travel required more planning and effort. Goods took longer to transport and often couldn’t reach more remote areas.

This all changed after the country invested the time and resources to build a system of highways. Though it took decades to develop, the Interstate Highway System eventually provided a way to travel most of the distance between two points on a direct and efficient path and at a greater velocity than would be possible on smaller, less robust roads. It drastically reduced the effort required to go from point A to point B, and the resulting economic and social benefits cannot be understated.

Adopting a Customer Data Platform (CDP) can be like immediately having a highway system for your customer data without needing to spend countless hours building it by yourself. Best-in-class CDPs enable companies to develop “customer data infrastructure,” or a foundation upon which data can easily be collected from different sources in real-time and delivered to the systems where it can drive value, all while ensuring quality and consistency in your data and protecting customer privacy.

Do All CDPs Provide Infrastructure Solutions?

We should clarify, however, that not all Customer Data Platform providers offer the same capabilities. The CDP market is a crowded space, with over 160 organizations self-identifying as CDPs. Many of the benefits discussed in this article pertain only to providers within the infrastructure CDP category. Infrastructure CDPs establish a foundational data layer upon which teams can freely move data between systems and applications in real-time while managing data quality and protecting consumer privacy. Through embeddable SDKs and APIs, infrastructure CDPs ingest first-party data across multiple touchpoints and unify it into persistent customer profiles. For these reasons, infrastructure CDPs are a powerful asset for engineering teams looking to solve data engineering and data integration challenges.

In addition to infrastructure CDPs, here are some other categories within the CDP market:

  • Multi-channel marketing hubs: While these tools provide data orchestration capabilities that facilitate marketing initiatives like offer management and triggered messages, these CDPs provide limited data quality and governance features such as privacy, filters, and forwarding rules. This leaves engineering teams responsible for monitoring data accuracy and writing one-off filtering rules.
  • Marketing clouds: Offered by large, multi-suite martech cloud companies like Adobe, Salesforce, and Microsoft, Marketing Cloud CDPs provide value to teams already deeply invested in a cloud suite. For organizations looking to build a vendor-agnostic tech stack from best-in-breed solutions, however, these tools may lack flexibility.
  • CDP toolkits and reverse ETL: These providers are built for teams that want to feed data from a data warehouse to cloud tools and utilize basic features such as discovering segments and performing analytics on top of their 1st-party data. While these tools provide an inexpensive integration solution, they require significant additional engineering work to support core CDP functionalities such as identity resolution, data replay, and profile lookups, as these functions are not offered out of the box.

While the initial adoption of an infrastructure CDP requires engineering support, non-technical teams are empowered with real-time access and control over an organization’s customer data through an interface once this foundational data layer is in place. With the segmentation and data forwarding capabilities that an infrastructure CDP offers, growth teams can create custom audiences and control how data flows to marketing, analytics, advertising, and operational systems without enlisting the support of developers. This frees technical teams from supporting their colleagues in the marketing and product organizations with vendor SDK management, data transfers, and more.

Simplified Data Collection

With an infrastructure CDP at the heart of your data ecosystem, you have a single access point for customer data and direct integrations with the most popular tools for engagement, analytics, advertising, and other functions. This means that instead of needing to maintain individual client-side implementations for each third-party system your company uses, the CDP largely takes over the responsibility of taking in all of the data you collect and directing it to individual vendors once it has been ingested.

Here’s an example of how that drastically simplified things from an engineering standpoint. Say you’re a music streaming service, and every time a user selects a song to listen to, you want to forward this as an event to Braze, Amplitude, Facebook, and AWS. First, you need to implement each separate SDK into your app, introducing additional dependencies that may increase your app size and diminish performance. Then you need to add API calls to each separate system triggered by user input, taking care to call the proper functions and maintain the correct event and attribute names in each separate log:

// Log an event to Braze
appboy.logCustomerEvent("Select Track", {
    "track_id": "2rqhFgyKwdb9MWmUPDhN6",
    "title": "Come Together",
    "artist": "The Beatles",
    "genre": "Classic Rock"


// Log event to Amplitude
Amplitude.getInstance().logEvent("Select Track", {
    "track_id": "2rqhFgyKwdb9MWmUPDhN6",
    "title": "Come Together",
    "artist": "The Beatles",
    "genre": "Classic Rock"

// Log custom conversion to Facebook
fbq('trackCustom', "Select Track", {
    "track_id": "2rqhFgyKwdb9MWmUPDhN6",
    "title": "Come Together",
    "artist": "The Beatles",
    "genre": "Classic Rock"

// Record with AWS for Machine Learning
    eventType: "Select Track",
    userId: email,
    properties: {
        "track_id": "2rqhFgyKwdb9MWmUPDhN6",
        "title": "Come Together",
        "artist": "The Beatles",
        "genre": "Classic Rock"

Now imagine that the same streaming service is using an infrastructure CDP. Here’s what sending that same event to all four separate services would look like using mParticle:

mParticle.logEvent("Select Track", {
    "track_id": "2rqhFgyKwdb9MWmUPDhN6",
    "title": "Come Together",
    "artist": "The Beatles",
    "genre": "Classic Rock"

A CDP allows you to implement events like this once. Growth teams have the ability to send them directly to leading marketing, analytics, and data warehousing solutions via direct integrations using a simple user interface:

A demonstration of the mParticle integrations interface.

A demonstration of the mParticle integrations interface.

Already, we can see how this degree of simplicity in data collection and delivery would represent significant time savings for engineering teams. In addition to reclaiming time, having a single access point for customer data also drastically reduces the potential for bad data to enter the system. Fewer implementations mean lower risk of errors in the collection code.

Spend Less Time Shipping Data

If simple and accurate data collection were the only benefit CDPs delivered, they would still be enormously valuable to developers. But this is just the tip of the iceberg in terms of the value these multifaceted tools represent to engineers. In fact, the biggest benefits that CDPs deliver to technical teams come farther along in the lifecycle of an organization’s data.

As an organization’s data use cases grow, most of an engineering team’s data-related workload shifts to tasks related to the ETL (Extract, Transform, and Load) lifecycle. In other words, the data has to be combined, formatted, and finally moved into systems or delivered to internal teams so it can ultimately have meaning and drive value. This can be a repetitive and time-consuming process for data engineers, as each one-off request for data from product or marketing teams essentially requires handling most if not all of of these steps:

  1. Deciding exactly which data points need to be included in the final deliverable
  2. Determining the system(s) in which the required data lives
  3. Extracting the required data from these systems (often involving manual querying)
  4. Cleansing the retrieved data of any unnecessary or confounding data points
  5. Combining data from multiple sources into a format that will be comprehensible to that data’s destination environment
  6. Considering any privacy implications that transforming/combining the data may have

Without an infrastructure CDP, these steps may need to be repeated in full each time engineers receive a new request for data from growth teams. Sure, it’s possible to build in-house automation to handle some of this extraction and transformation work, so not everything needs to be repeated manually. But internal ETL solutions come with their own technical debt and will inevitably need to be maintained over time as the organization’s data needs evolve. Furthermore, it’s unlikely that these internally-built tools will be able to handle the full range of collection and delivery tasks that business teams need over time.

With a CDP at the heart of your data infrastructure, engineering teams no longer have to spend countless hours manually handling data shipping requests or maintaining the internal systems that automate them. In fact, much of the work involved in creating data segments and directing the flow of data to third-party tools will not even fall on engineers since non-technical teams can handle these workflows directly within a CDP’s user interface. Freed from the day-to-day burden of data transformation and delivery work, data engineers can focus their time and energy on more strategic initiatives.

Build Data-driven Products and Features

The technical benefits we focused on above largely have to do with saving engineering time, which, although valuable, is somewhat of an indirect benefit of adopting a CDP into your data stack. In addition to saving time, infrastructure CDPs can also be leveraged to add value to features and even build new products centered around real-time data.

Infrastructure CDPs like mParticle provide real-time access to customer data not only within engagement and analytics tools but directly within your products and interfaces. mParticle’s Profile API, for instance, provides a way to query customer profiles, audiences, user attributes, and other data available within your mParticle instance on the fly. The possibilities here are bountiful for any engineering team tasked with providing personalization and context-based experiences in their products. For instance, you could leverage this interface to:

  • Display products and services to customers based on their in-session behavior
  • Build machine learning-powered recommendations into your interfaces
  • Send location-based push notifications to customers when they enter your store
  • Deliver personalized features and user journeys based on a customer’s previous behavior and product interactions

An organization’s data shouldn’t only benefit growth teams while presenting engineers with a hurdle to jump over. This no longer has to be the case with the robust data infrastructure that an infrastructure CDP delivers. An infrastructure CDP can turn your data into a valuable addition to the engineering team’s toolkit and unlock powerful opportunities to leverage user insights directly in your websites and applications.

Hopefully, it is now clear that infrastructure CDPs are not simply a “martech” tool whose sole utility falls to marketers and product managers. These powerful systems completely transform a company’s data infrastructure in a way that liberates developers of overhead and empowers them to build data-driven features.

Learn more about some of these specific use cases and how Customer Data Platforms (CDPs) can transform the relationship between developers and data.

react to story with heart
react to story with light
react to story with boat
react to story with money
. . . comments & more!