Software debugging can often feel like a never-ending maze. Just when you think you're on the right track, you hit a dead-end. But, by employing the age-old technique of the process of elimination and using the analogy of the 'Tong Motion,' we can navigate this maze more effectively.
As a side note, if you like the content of this and the other posts in this series, check out my
The process of elimination in debugging is straightforward in principle: continuously rule out non-problematic components until the root cause reveals itself. This can be achieved either by commenting out lines of code or using debugging techniques, such as the 'force return,’ which bypasses specific code paths.
For front-end issues, replicating the problem using tools like Curl or Postman is valuable. It helps us determine if the bug is within the front-end code or elsewhere. This way, we can quickly narrow our focus, not merely addressing the symptoms but locating the actual bug.
Unit tests are our best allies when it comes to debugging. By focusing on isolated units, they hone in on potential problem areas.
Mocking frameworks like Mockito come in handy as they can simulate large parts of the application. This way, we can drill down on the exact problem, circumventing potential disturbances. Moreover, using mocks can prevent regression and make our test cases cleaner.
However, while there are best practices regarding the extent of mocking when debugging a specific problem, it's more pragmatic to mock as much as necessary to distill the problem to its essence.
The elimination technique is less straightforward with flaky issues - those bugs that appear irregularly or whose behavior changes as code is eliminated. The key strategy here is to focus on the negatives. In simpler terms, if removing a certain block doesn't cause the problem to appear, it doesn't automatically indict that block. The absence could be due to the bug's unpredictable nature. Hence, it's crucial only to trust instances where the problem consistently reproduces.
Think of tongs. They grasp from both sides. Similarly, almost all software has at least two primary interfaces or points of input/output. For instance:
Using the example of an enterprise web app:
One common pitfall is neglecting one prong of the tongs or misplacing the other. It's crucial to ensure both sides are appropriately positioned; otherwise, it might skew the results. If stuck, consider investigating from the opposite side and then revert when needed.
In a real-world scenario, while tackling a server performance issue, I employed the 'Tong Motion' technique. By replacing web calls with curl requests, I shifted focus to the problematic area. At the same time, I enhanced database logging to monitor its output as problematic SQL was replicated through curl. This dual-sided approach helped unearth a bug in the Object Relational Mapping layer.
This concrete example comprises of the following stages:
The tongs start by mocking the web tier with curl or postman. This eliminates front-end-related issues.
The other side of the tong motion replaces the database with mock data.
If the issue can be reproduced, we can further squeeze the tongs by invoking the presentation tier method directly in a test case.
We can then eliminate the database entirely from the equation by mocking it in a test case.
Finally, we can invoke the business method directly, eliminating the presentation tier aspect.
We can mock its dependencies, which means we narrow down on a specific method that’s at fault while eliminating the rest of the application.
Debugging can be a daunting process. However, with the right techniques, like the process of elimination and the 'Tong Motion' approach, it becomes a more manageable task. Always remember to tackle issues methodically and from all angles to find and fix the root cause effectively.
Abstract: Once we press the merge button, that code is no longer our responsibility. If it performs sub-optimally or has a bug, it is now the problem of the DevOps team, the SRE, etc. Unfortunately, those teams work with a different toolset. If my code uses up too much RAM, they will increase RAM. If the code runs slower, it will increase CPU. If the code crashes, they will increase concurrent instances.
If none of that helps, they will call you up at 2 a.m. A lot of these problems are visible before they become a disastrous middle-of-the-night call. Yes. DevOps should control production, but the information they gather from production is useful for all of us.
Also published here.