You’ll Never Fix It Later - How to Pull Your Team Out of the Quicksand

No startup starts with a tester.

Then 2 years later, that startup can't release a new feature for their biggest customer ever unless they rewrite the whole thing with a big bang. It's painful to watch this over and over again.

There are simple things to do to keep the sanitary quality of our software, so you don't end up in this situation.

First, embrace your fears. Engineers are brave people, and we won't readily admit we have concerns. But we have, and the sooner we admit and share those concerns, the sooner we'll prepare to face them.

How We Build Software Today

We're in so much of a hurry that we're always late. There's no time for testing. There's no time for unit testing. There's no time for refactoring.

Well, there will never be enough time to do it right while we keep sabotaging ourselves as engineering teams.

We're asked how much time it will take, and we keep excluding unit testing, manual testing, and even API testing and refactoring from our estimations.

So we build software like this:

Quick and dirty

Not quick, still dirty

Quick and happy diagonal path tested

Everything else is considered an edge case. (There's no such thing as an edge case.)

Few of us are even braver to say no to quick or dirty and build software fast and stable.

What you should cut from your estimations is the scope, not the quality.

What do you know about the quality of that component you're working on?

You know it's fragile, but you don't want to touch it because it's very fragile.

You've done this long ago, you barely touch it, it should work.

You don't know it's fragile.

There are no tests and nobody to test and expose how fragile it actually is.

Are you happy dealing every day with this dirty software?

Are you happy making compromises with the quality of your work every day, knowing it's far from good? Coping with "we don't have time to clean" and still being late because it is dirty and you can't possibly move on without cleaning it up.

Imagine you're at your next job interview. What will you brag about in your current job? That you can perform under pressure and do late-night fixes? They will love you for that, but they will also ask you why you didn't do something about it. Maybe it wasn't your job.

There were team leads and engineering managers to make those decisions. If it were up to you, you would've done something. You told them the code needs refactoring, and you need to plan time for the tech dept at every retro, and nobody listened.

Well, guess what - you don't need permission. The quality of your work depends on you only. Nobody can force you to write crappy code. Time pressure is a short-term excuse. Quick and dirty solutions delay the entire project and cost more than if you do it right the first time.

You Will Never Fix It Later

Are you willing to make that difference?

Then, here is what to do: get disciplined. I'm sorry, but there's no other way around it. And it should be at all levels.

What should be your first steps to building code with higher quality?

Implement Logging and Alerting

There are tons of tools for logging and alerting. This is the first thing to do. Your software is crashing, and you're completely unaware.

Find a solution that allows you to easily create tickets from the exception alerts and mark them as "known" or group them once they're reported.

Build a Team Routine to Watch and Report the Alerts

If you think it's dull, let me ask you: Are you deep in the zone during your entire workday? After all the meetings and the heavy programming work, you need to ease your concentration a little. Well, browsing through the alerts channel is way better than browsing through any other channel.

Just click the "Report a ticket" button or the "Mark as known" button. And if you got intrigued, you'd probably fix one or two ones.

Here comes the biggest argument - we can fix a couple, but we're writing tests that someone needs to confirm, we need to deploy, and this generates more unplanned work for the team.

The PM will yell at us that we're not working on the high-priority items but gold-plating small alerts.

The Team with all the "M" roles in it agrees about that.

Publish a rule of thumb - "If it looks small and low-risk, and there is absolutely no other work in progress, just go for it and fix it. We can handle one or two small alerts fixed "out of scope" per Sprint, along with the planned ones."

That's it, very simple.

Start Cleaning the Logs and Alerts One by One

In addition to the occasional fixes, plan the cleanup properly. Note that by planning, I mean allocating daily/weekly time for fixing them regardless of their severity or priority. You'll waste more time evaluating their priority than fixing the first 5. So, just start fixing.

Make sure there is one alert for each developer, or if you're mobbing most of the time, set a daily timeslot for alerts. And I would do it first thing in the morning because otherwise, you'll never do it.

Again, it's not a new feature, and it might seem as if it eats the time from your critical priorities, but your critical priorities are already late because of those hidden issues. You're already late!

Remove All Empty Try-Catch Statements and Let it Crash!

While alerts and logging will expose a lot, they won't be able to log what you hide. So stop sweeping your problems under the carpet.

What crashed 2 years ago, and you didn't know why is completely different today. Let it crash, and fix it. It probably won't even crash the same way, or it's not crashing even, so one way or another, your code will be in a better shape without those.

Start Writing Unit Tests Right Now

I mean it. You're working on some code right now. Then nothing stops you from writing a unit test.

Oh, I see! There's no infrastructure ready for unit testing, is there? Come on! You're going to delay that wannabe feature anyway.

You don't need more than a day to set up the unit test framework and your CI to run the tests. Just do it.

Start TDD for Bug Fixes

"We don't know what's the problem and how we're going to fix it. We can't write tests for code that doesn't exist yet."

My dear developer, the purpose of testing is to check a hypothesis. That's valid for testing in general, not only for software testing. So what you need to write a test for is not the code. You need the expected behavior and result.

Now, if user stories are vague and unclear and you're not ready to start writing tests against the user stories, I have a solution for this too, but before that - start writing test-first code for bugs.

There is a component of the bug description called "Expected result", and the steps to reproduce are your test case. So you already have the test case in front of you. Now, code it first before you start coding the fix.

Test-Driven Development is writing a test that validates you've done the right job, not that you've done it right. Unit tests verify whether you've done it right.

Draw a Coverage Roadmap

Test automation and tech debt have similar tragic destinies - they are never started because "we can never cover everything, we can never clean up everything, it's too much of an overhead. We don't have time for that."

Hear me out: you don't have to automate everything!

But you definitely must automate your mission-critical elements - high-priority use cases, critical infrastructure, and core functionalities. Call it what you like, but the business relies on 20% of your code and infrastructure, and your customers are using 20% of your features.

You see where I'm going with this.

You don't even need to start writing automated tests for these features right now. What you need to do first is prioritize them.

Get together as a team in front of your high- and low-level architecture diagrams. There aren't any, are they? Or the ones that exist are a photo of a whiteboard taken 2.5 years ago? I know.

Yes, you need to spend half a day and get these up to date. The good news is that there are fun tools to help you, so you wouldn't need to maintain it manually. Do some research.

You're an R&D team, after all, right?!

Don't Aim for Full Coverage - Aim for Low Risk and High Confidence Instead

While diagraming your up-to-date architecture and infrastructure, add notes or circles, or bold or paint red all the places that are:

Mission-critical for your business and software

Desperately in need of refactoring, or you'll keep postponing that other mission-critical feature all the time.

Constantly breaking and eating your time fixing them is pretty much in vain because they'll break again in a couple of days.

When the drawings are ready, sit with your team and brainstorm what needs testing for all these circled components.

Don't fall into the trap of "this is too much! We must stop all other work to cover all of this!" You don't have to cover all of this at once. But you need a plan. Now, open another board, and start writing your Testing Goals. Examples:
- Cover all fetch-data API requests for success and failure so you know that you don't send bad requests, and if it fails, it fails because of the vendor.
- Outline the mission-critical user journey. Break it down into units. Write the unit tests. When the test pyramid was invented, a unit meant a component, not a method, so it covered functionality, not methods and classes.
- Identify the traffic jams, and decide how you are going to untangle them.
- Plan to do it right. Your quick and dirty code will remain dirty with dirty tests.

I intentionally used a coverage Map and not the testing Pyramid because... at this stage, when you don't have any tests in place, manual or automated, you're not ready for the Pyramid.

Many teams have the wrong impression that they have first to achieve 96% unit test coverage, and then move to the integration tests and so forth. The problem with modern unit test coverage is that we test Methods and Classes, not functionalities.

Even more teams start with UI automation, which is equally wrong.

That's actually the worst thing you can do, and it's doomed to fail.

So don't start from the top or from the bottom of the Pyramid, but build a risk map instead.

Some features might need extensive integration testing, other features might need extensive UI testing, and the next portion might indeed be a core feature that everything depends on (another red flag, though), and you have to cover it from top to bottom fully.

Or it's a data collection service, and you need to focus on the performance of the APIs, for example.

Build a map of your software hotspots that need heavy testing.

Then think about how you can automate that testing.

You Can't Progress in Quicksand and Only You Can Pull Yourself Out of There

Let’s wrap it up. If you spend half of your life working at a place without testing, the code is in bad shape, and you don’t have time, you keep making compromises because “you don’t have time”. You’re not entirely happy, or at least you’re pretty indifferent – they pay, it’s their call, and it’s not your decision.

That’s just sad. I’m sure you’re not proud of your job. And you might even wait for the next unicorn to recruit you where you’ll love the job as well as the cause.

Well, things changed a lot last year, didn’t they?! We yet again learnt that unicorns don’t exist. You need to be an outstanding developer to remain hired. Companies figured out that they could throw away people regardless of their contribution.

Still, there are engineers that a company will do everything to keep. To be one of them, you need to drive change and constantly prove that the business depends on people like you.
Everyone can write crappy software, but the really good ones not only write excellent software with joy but also inspire others and spread the code-quality culture.

You can’t progress in quicksand, and only you can pull yourself out.