Backend Dev @ Shopify
My team has recently successfully decoupled one of the critical business domains of the company. The initial integration had such a tough deadline that the only way to meet it was to add code to the monolith. And… The feature that went from conception to production in three weeks ended up taking almost one year to decouple.
By decoupling, we actually paid back, in development time, our tech debt.
Debt is generally seen from two sides. For example, looking at credit card debt, we often notice two types of behaviour: the people that always pay their debt at the end of the month, and the people that see it as free money, and never pay it back. However, I would put decoupling somewhere in the middle.
Decoupling is the act of realising you are in debt, and you decide it’s time to work on it. You know it’s going to be painful. And the pain is going to last much longer than it took to create the problem.
Was the pain worth it? 100% yes! After three weeks of development, the code we released generated consistent revenue. And with the revenue generated, we paid ourselves to fix the corners we cut.
We were new to this. We did some things right, but a lot more things wrong.
Firstly, when deciding to decouple your business domain, your team, with your stakeholders, should start asking some questions:
Since domain driven design is a hot topic right now, you are probably thinking what is an appropriate domain to extract, or you might even know it already. It is very important that this domain is defined, but it is even more important that this is a long term goal, and NOT the result of the first decoupled service in production. Data store involves data synchronisation, and that can take a lot more development time.
One of the biggest mistakes we made was to treat the decoupling as an opportunity to learn new technologies. When decoupling, you are making the ground decisions for a new application, and it is preferred if you can make decisions based on your past experiences. When you are not an expert in the chosen technologies, you will do a lot of guessing, which can cause an increased failure rate. Be careful, or your team might be the one building the legacy code that no one will want to touch in the future.
There is a time and place for learning new technologies, and it is not through decoupling.
The first version of the decoupled application should not be fully decoupled. If eventually you will need data from multiple applications, and those applications do not expose their data yet, try to approach the move by tackling one service at a time.
For example, let’s take the common case of you extracting a domain out of a monolith.If the monolith makes an API call to another external service, and changes it in order to get to the end result, let the external data processing inside the monolith, and send the result directly. This will avoid you having to make changes in too many places, and lose focus in the process.
We often had issues understanding what happens inside the monolith, and what a change implies. And this is where my love for documentation comes into the picture. While I am against writing code documentation which can be easily outdated, I think diagrams are the easiest way to visualise flows and structures. So, to assure team synchronisation, we had workshops, discussed the problems, and build the diagrams for the final proposals for:
~ Cross service communication (REST and/or events)
~ Initial data flow
~ Expected final data flow
~ Database structure
This goes hand in hand with the other points, but I cannot stress how important this is. The stakeholders don’t understand why you stopped releasing features (as fast). The product team doesn’t understand when this project that blocks their features to come live is actually going to end. Even if they could understand, the other engineers have no idea what you are going through to make this happen. Some might even think you are lazy.
You will not be able to avoid this fully, but communicating how the work is going to be split into releasable pieces would definitely give your team more transparency.
By this point, your plans are all set. But you know it is just as easy to mess up during development. I mean.. the waterfall model has proved us wrong many times in the past. Yes, it is good to have some plans, but, in my opinion, only high level ones. And then, during development, you should be aware of the following topics:
The structure can be tricky or not depending on the programming language you are going to use and/or the size of the domain you are going to decouple. For example, if you are extracting a whole domain (backend and frontend) and are going to use Ruby on Rails, the conventions are going to guide you through. But, if you are doing a micro service in Go, you will have to configure everything yourself.
If the project structure is debatable, you should involve some senior developers/architects in your company that have worked or started projects in the language you are using.If you don’t have the luxury of a second opinion in your community, find it online. Read some articles, research the necessary libraries, and check the code structure of similar open source applications. You could even ask for opinions on blogs, you’d be surprised how many people are willing to help, if you ask.Libraries
You know you should not implement everything by yourself. At the same time, you also don’t want to add unnecessary dependencies. If you have projects in your company in that language, you should check them, and choose libraries that your coworkers have, if it makes sense. On the other hand, if you are the first one, you are setting up the standards, so you should try to anticipate blockers by doing a bit (but not too much) research.
As a rule of thumb, you should assume you are going to make some bad choices, so try to create wrappers for your libraries as much as possible.
I am one of those people that just despise infrastructure. However, it is vital for our profession. How is the app not going to be used, if it is not on a server? How many times will we forget to run the test suite, if there is no continuous integration tool set up in the environment?
Deployment was definitely one of the topics we approached worst while decoupling. Firstly, we decided to go with Kubernetes, even though there was no service in the company that did it. And since the service needed multiple interconnected resources, this was the worst project to trial Kubernetes with. We spent months with the infrastructure team until we deployed, as they were also not ready to serve in production a service complex as this one. Instead, we could have went with what we knew, deployed it a few months earlier, and moved to Kubernetes when it made sense.
By this point, the only way we managed to justify the wasted time was by creating extendable resources that the company can use.
An application is not production-ready without a production environment.
Even if your application is not production ready, do not delay deployment. Once you have the environment, you can test with real use case scenarios, even if you are not using the application yet.
Before you release, you should have the option to access the production environment. Yes, people say you should not use it, but when you have just released, there can be unexpected behaviour, even compared to a staging environment. And you should prepare your system to have the necessary tools to identify what happened when something goes wrong:
Logs — Since this is the fastest way to identify what happened, you should consider adding logs for any cross service communication (Events/HTTP requests, in and out) and bigger processing chunks.SSH — Maybe not everyone, but someone should have direct access to the production environment. Sometimes deployment is unsuccessful, some environment variables are not set, and with SSH access you can figure it out and fix it quick enough the stakeholder might not even notice.Console access — You should not modify the codebase in production, but as a last resort, you should provide the way to debug inside the production environment using the classes/modules you have created.End to end testing sessions
As my team did not have a dedicated QA person in the past year, but quality was very important, us developers had to learn how to test in order to develop a robust service.
Most developers think their responsibility stops at automated tests. This should not be the case.
One of the right things we did as a team was to set up dedicated end to end testing sessions. In each of these sessions we found bugs, overall helping us deliver a service of better quality. In every session, by taking turns, every team mate will have one of these responsibilities:
Doing the actions: clicks, API callsMonitoring each service running (every server open for the system to work as in production)
You have finished testing, fixed bugs, and released the MVP. This is great news, but there’s still a lot of work left!
Ideally, especially for more complex domains, you’d expect the MVP is not the final decoupled codebase. At this stage, you should figure out what is the next releasable piece, and to it all over again. Good thing you don’t have to tackle deployment anymore!
Yes, logs are enough to debug, but monitors would help you find issues much faster. In order to improve your service, you should look at monitors in two areas.
Especially when your application has customers globally, therefore is used 24/7, developers should know when any component of their application is unhealthy. As a baseline, in our applications we look for:
Liveness check, every 5 minutesMemory usageNumber of errors thrownNumber of API callsAPI response time, with alerts triggering for any request durations over 300msDatabase CPU level, with alerts triggering over 60%Number of database connectionsDatabase read/write IOPsDatabase read/write latencyEvents: number of messages received, processed & not processed
I think it is very important to show the status of your application from a business perspective. At the end of the day, we develop an application for it to be used towards increasing revenue and eventually profit.
As developers, we are able to log actions not saved in the database. By contributing towards the business with business metrics on top of the code, we increase our team’s visibility. These metrics are specific to your company, so I would suggest to collaborate on creating a business metrics dashboard with your product team.
Now that you know how your application impacts the business, it’s time for you to show it. Most developers avoid this part, as they think they have done their job and/or are uncomfortable with speaking in public. However, when you work on a fully technical task, which brings no product value in terms of features, you are the only one able to explain the long term value of having better code architecture.
If no one knows why it happened, then the reward is the same as if it didn’t happen.
By presenting your work and emphasising the benefits of the change, you can:
~ Inspire other developers to go towards the same path and make their systems better;
~ Show the product and business development teams how the resulting software is going to help towards delivery time;
~ Show how much work you actually did, as most people do not realise how many problems we have to solve to deliver a technical product;
~ Allow others to appreciate you, and make your team feel appreciated.
Here is a short summary of what involves decoupling a software application. Every decoupling brings different technical challenges. However, to keep the quality of your application at a high level, you would need to tackle many of these points.
Did you go through this process recently? What else did you learn? What do you wish you did differently? Share your thoughts with us in the comments!