Software Architects' headache - the Integration Point

Introduction

In your web software engineering career, once you get to deal with architectural topics, you may learn of many patterns that are usually "problematic" or otherwise uncomfortable to deal with, yet you might have to work with them regardless. It may be a requirement from a customer or management, organizational practices, or just a part of a system or a product that isn't properly thought out or otherwise mismanaged. Today, I would like to talk about a pattern (or an "anti-pattern"?) that I don't hear people talk much about - an integration point.

What is an integration point?

Let's start with some simple mind experiments and imagine the following situations:

You are tasked with developing an authentication framework for the customer's business. To pass the authentication process, the user's information has to be retrieved and validated. The information is stored on a separate customer server and has to be behind their corporate API. The information is served in some kind of XML format.
You are tasked with building processing of user's activity data. You might have to build a chart on top of it or extract some sort of metric from it. The said data is owned by a separate department and is provided via an event queue.

You might be starting to notice a common pattern here. There is an upstream service that is sending data to you. Alternatively, you might be receiving data from the said upstream service. This service you don't own is sending you data you don't control, all done by a team you're not a part of.

What can go wrong?

Well, first of all, since we're not in control of our data format anymore, we can assume that it may change in ways we do not expect. Adding or removing some fields might be handled with some data parsers, while switching data types or response codes completely might not be. Second, consider that the lifecycle of the app in question might not be in your control. Downtime, throttling, and errors of all kinds might happen when you least expect them.

Now, an important underlying issue here is how your organization is associated with the owners of the service in question. Are their KPIs and goals aligned with yours, and is it in their interest to maintain and ensure the stability of your work? Are they open and available to listen to your requirements, when, and in which format? Consider how they might not be available at critical moments.

What can be done about it?

Document the data format.

To be able to work with data properly, the receiving team must know what to expect. The data-owning team should usually also play by those expectations. To achieve that, all response formats, all status codes, all endpoints, etc, should all be in a fixed format. More importantly, changes to the said data formats should be communicated in advance as well thoroughly documented. The perfect scenario is when data format changes are handled in reverse-compatible ways. Once a bidirectional framework like that is established, given that a change in data format happens, it would be easier to find the point of failure and communicate it to your partners.

Prepare for network errors.

The bidirectional relationship between the sending and the receiving services should be implemented in a way that would make it resilient to possible downtimes. Longer response delays can be put in place to account for throttling, although usually increasing delays is not a long-term solution. In case the counterpart service becomes unavailable, there are a couple of things that can be tried as well. Depending on the domain circumstances, there's a choice between serving cached responses, serving fallback data, or even issuing clients with a notice that the underlying service is temporarily unavailable. If data from the underlying service is not critical to serving a response, it may be ignored altogether sometimes.

Be careful around caching.

With all of the above in mind, you can not always make a request to the service serving the data for you when you need it. You might be tempted to save it locally or cache it, which would be the right thing to do most of the time. You should always think about how to react to changes in the underlying data. Can the other service send you events once data expires, or do you need to handle that yourself? When should you expire your cache in such a case?

The data model can mismatch

Now, imagine the following. So, our service has received the data from upstream and saved it successfully. And now, once we want our app to do something about it, the application fails. Or we serve it to the client, and the client is unhappy. As an example, the data points that we've received are valid, yet they simply do not form a valid chart.

Web applications are sometimes more than just data. Sometimes, they are also a data model built on top of the said data. Meaning we take the data and we do something about it. The constraints that you might have expected the data to abide by might not have been originally intended for it. The solution to such a problem is not always clear. But you can ask yourself the following questions:

Should our expectations of the data be changed?
Can the data be corrected or fixed?
Can we switch to a different data source altogether?

Assume responsibilities

So, we integrated the two systems together, and it came to a point when something went wrong. This means that someone would now have to fix it. In such cases, there must be a clear understanding of who's responsible for which part of the integration. The data will usually be owned by the other team, of course, but it is not always so clear when it comes to the format of exchange. Of course, this problem can be discussed once it occurs.

Setup monitoring

In case an accident does happen, timely alerts will help both your team and the counterpart team react to the incident in a timely manner. Let's not stop on this in detail since there are most likely plenty of articles written on the subject matter.

Conclusions

As you can see, something that seems like a trivial problem at first can have a lot of nuance to it. Let it be known that the internet of today is built on top of tiny little nuances like that, and it has worked quite well for us up until now. With some thought put into it, such problems can be solved to a good extent.