Navigating the world of privacy and compliance is no trivial task. On one side, we see a world that is becoming more and more privacy-conscious.
On the other side, we see a proliferation of applications that compromise user privacy. In this tug-of-war, the landscape of what you can do with people’s sensitive personal data is continually changing.
And just like a tug of war, lines are continually being drawn. For example, the General Data Protection Regulation (GDPR) applies to a major region of Europe consisting of multiple countries (the EU), each with its own set of data privacy laws.
In the United States, individual states like California have been enacting legislation like the California Consumer Privacy Act (CCPA), which directly affects data privacy and compliance requirements in those states.
So, how do you, as a software developer, succeed in this constantly changing and inherently difficult domain? In short, you need a data privacy API that can keep up with the near-constant changes, so you don’t need to write all of the logic from scratch.
In this post, we will look at the data privacy API—specifically, the “what”, the “why”, the “how”, and the “who”.
When talking about data in general, some companies use the terms privacy and security synonymously. In our context, as we discuss a data privacy API, privacy and security are two entirely different concepts; and we’re focusing on privacy.
Data privacy is about more than encryption, it’s also about ensuring that data has governance and access controls that make sense for the type of data that’s being accessed.
Traditional models like RBAC are fine for granting access to an endpoint or API; but in most cases, RBAC alone is insufficient for data governance and is often far too coarse-grained to be compliant with data privacy laws.
Let’s start with the obvious: privacy and compliance are hard. It is no trivial task to build an application that protects data privacy effectively and fulfills compliance requirements entirely. This type of endeavor requires a lot of overhead, and doing it yourself incurs many risks.
Wouldn’t it be great if you could avoid these risks by isolating sensitive customer data outside of your applications, then using a data privacy API to utilize this data?
A data privacy API does just that: It abstracts and simplifies the core functions of privacy and compliance so you don’t need to do it yourself.
With data isolated outside of your core infrastructure, you have less sensitive data in your application. This reduces the risks of application development and gives you more freedom to focus on your core capabilities.
And because you can worry less about the compliance risk and technical aspects of holding sensitive data, you can innovate more rapidly and accelerate your time to market.
All of this is possible because a data privacy API uses the principle of data minimization: you only retrieve the data you need—and none of the data you don’t.
Simplifying data governance through the use of a data privacy API opens up opportunities in new markets, like the EU, where data privacy regulations are stricter than in markets like the US, where you might currently conduct business.
A data privacy API eases and expands your ability to meet data privacy requirements, providing the flexibility to address the privacy and compliance concerns of any new markets your business might enter.
Like taxes, regulations only seem to expand. Repeated high-profile data breaches are evidence of the importance of data privacy and compliance.
A breach of some kind is a matter of when, not if, so minimizing the amount of sensitive data your application holds—with techniques like tokenization (discussed below)—is key to reducing your future liability.
After all, if a data breach doesn’t compromise sensitive data like names and birthdays, but only non-exploitable tokens, it becomes much less of a concern for you and regulators.
The answer to our big question varies by vendor. However, at a high level, an effective data privacy API should at least cover core capabilities like data governance, integration with trusted third parties, and tokenization.
Data governance is the ability to apply fine-grained control to data access by ensuring that only the necessary data is used for a specific function and that no user has overly broad access to sensitive data.
This approach minimizes risk, and it is usually achieved through a mix of RBAC and ABAC. A good data privacy API facilitates data governance with both well-designed default configurations and extensive customizability.
A good data privacy API provides the ability to control and secure the flow of data to and from third-party services that require access to sensitive data. How that data is transmitted to outside parties has a significant bearing on your ability to remain compliant with data privacy regulations.
For example, let’s assume you’re building an online service that helps users obtain certified copies of criminal background check attestation letters. On the one hand, you need to store sensitive information about each user, such as their name, address, and driver’s license number.
When it comes time for them to submit an application and payment to the background check agency, you need to combine the personal identifiable information (PII) that you have with the user’s submitted credit card information to make a payment.
Stitching together these different sources and pieces of sensitive information in a regulation-compliant manner can be incredibly complex and fraught with missteps. What’s needed is a data privacy API that manages the relaying of sensitive data.
The result is an assurance that any transmission of sensitive data to and from your application is done securely, reducing your risk.
Of course, the ability of a data privacy API to achieve such integrations depends on the presence of an essential feature in data privacy: tokenization.
Tokenization is a non-algorithmic approach to data obfuscation that swaps plaintext sensitive data for cyphertext tokens that have no exploitable value. It provides a way to add stand-ins (references) for sensitive data values to your applications and infrastructure.
The actual sensitive data values are secured outside of your applications and infrastructure and accessed by a detokenization process (subject to authorization).
By replacing the many places where sensitive data can linger in your systems with tokens, tokenization improves the data privacy of any application, website, or backend infrastructure deployment.
If tokenized values are exposed—whether by unauthorized server access or inadvertent logging—they’re of no tangible use to those lacking authorized access to the data privacy API.
Presently, there are very few notable vendors in the data privacy API space. One in particular that stands out is Skyflow. The Skyflow Data Privacy Vault offers more than just the three capabilities we covered above, but it really seems to cover these non-negotiables well.
Let’s say, for example, that you have a customer support team that must comply with GDPR, so they require proper data governance. To do its job, the team needs access to some data in the “customer” table—but certainly not all of it.
With Skyflow, you can limit the data that comes back from a request, using policy-based access control. Your customer support team can only see what they need.
To see why this is so important, consider this requirement from GDPR Article 25: “...only personal data which are necessary for each specific purpose of the processing are processed.” We can see in the example above that Skyflow makes it much easier to fully or partially redact data based on the role that is accessing the data, helping you stay compliant with regulations. You can see more examples of this in Skyflow’s data governance demo app.
Skyflow follows the standard REST API approach of using GET and POST requests to store and retrieve tokenized data. In your application, you store the intermediate data representations—the tokens—using the API to retrieve sensitive data when it’s really needed.
This approach offloads much of the compliance burden that comes with storing sensitive data. This way, if a breach occurs, only the tokens are stored within your app.
It’s worth noting that Skyflow uses its own tokenization functionality to isolate your application from third-party connection credentials.
You can configure a connection with a service like Plaid, for example, and Skyflow will dynamically tokenize sensitive data on the fly that you can store or detokenize to use in your application. This streamlines connection management, vastly simplifying the task for developers.
We said it earlier, and it’s worth saying again: Data privacy and compliance are hard! You’re in a world with many moving parts, regulations, and malleable boundaries that will shift with time.
A data privacy API abstracts and simplifies the complexity of trying to compete and comply in that world.
If your applications are to stay relevant and serve as large of a market as they possibly can, you’ll need a data privacy API that makes it easy and secure to meet (and maybe even exceed) your compliance and privacy requirements.