When building a company, you often have to make sacrifices. There is never enough time, money, or talent to get everything done. Sure, you’d like to build your product the right way, but you’ve got to get something built quickly so you can raise money, or get new features out the door so you can land that big client. There’s no time to plan! We have to move fast! I get it.
Then one day, you’ve finally got your feet under you. You’ve gained some traction, you’ve got revenue coming in, and you’re growing quickly. You’ve been bootstrapping and hustling and crushing it. You’ve arrived.
Now you’re starting to notice something is off. Your engineers used to be able to crank out new features quickly, but they’re now missing deadlines, and they’re having a hard time telling you when things are going to get done. Things are breaking more often in production, and you feel like you’re constantly in fire-fighting mode. The system seems to be slowing down, and your users are complaining about performance and bugs. Your employees seem burned out, maybe you’ve even lost a few people, and your new hires are having a hard time getting up to speed.
You are now bogged down with technical debt. All those corners you cut to get where you are now are killing your productivity, your morale, and your business. You avoided paying the true cost of software development, and now the debt is due, with interest.
Almost every company I consult with suffers from this problem to some degree. A certain amount of technical debt is excusable, but you need to start paying it down, or your productivity will grind to a halt. Sure, it will add overhead, but as your team gains momentum by clearing the roadblocks, you’ll start seeing improvements in every area of your business. This is unavoidable, so you might as well get started now. Let’s discuss some of the critical areas where you’re likely not paying the true cost of software development, and some important steps you can take to get on the right track.
“We do Scrum, sorta” is a phrase I hear with almost every client I meet. This typically means they use Jira to track tasks, and they time-box their work into sprints. Maybe at the end of the sprint they release to production. That’s about it.
Yet they are missing the single most critical aspect of agile development: the retrospective. The foundation of agile development is based on reflection and continuous improvement. Throw away every other aspect of Scrum, Kanban, or Mob. If you aren’t doing a retrospective after every sprint, you are never going to improve. Yes, it takes time, but if you aren’t investing in your growth, you will never grow.
An effective retrospective is a process of reflection, communication, brainstorming, and experimentation. We discuss what went well and what didn’t during the last sprint. We communicate our frustrations to each other in a constructive and honest way. We brainstorm ideas for tweaking the process or communication to improve, and we commit to experimenting with a new process during the next sprint. The ideas that fail are discarded, and the ideas that succeed are baked into the process going forward.
Credit Scrum.org
There is often a language barrier between product management and engineering. Product managers typically think in terms of user-benefits, while engineers think in terms of technical implementation. PMs think high level, while engineers are down in the weeds. This communication breakdown results in a lot of wasted time and effort to clearly define the goal and how it should be implemented in fine detail.
Your PMs should capture the vision of the feature in an effective user story. They must then account for all the “what if” scenarios of various use cases. I typically follow the user story with a series of if-then statements that account for as many edge cases as possible. You should also include acceptance criteria, ideally in the form of a test case. While you certainly want to leave room for the developers to be creative, there should be no ambiguity about what success looks like.
After the retrospective, the code review is the second most important ritual in software development. This is not just about walking through code line by line, it’s about explaining the user story, validating that it has real success criteria, demonstrating that it works correctly, and ensuring the code is up to standard.
My process for a code review looks like this:
When I engage with new clients, one of the first areas where I look for disorganization is in the code repository. I’ve seen hundreds feature branches that haven’t been removed since inception. I’ve seen commits with no commit message. I’ve seen developers commit directly to master without a code review. I’ve seen developers working against revisions that are out of sync with production. The horror.. The horror!
For starters, use the Git Workflow. Every developer should create a new feature branch for each new story. When a feature is complete, it should be merged into a development branch. When a release is pushed to production, a release branch should be created as an artifact of that release. The feature branch should then be deleted after release.
Credit Atlassian
Use pull requests to merge branches. Don’t commit straight into a main branch. No pull request should be approved without performing a code review first.
Create meaningful commit messages, and use descriptive names for branches. Your code history should be readable, just like your code.
Imagine if you were trying to build a house with no blueprint and no foundation. You decided to just start slapping boards together, because you needed to get the house built quickly. This might work if you were building a tree house, but not a skyscraper. The same goes for software.
The data model is the foundation of software. If there is one area to focus on early on, it is creating a clear, logical, scalable data model. Since every piece of code writes to or reads from the same set of data tables, it is incredibly difficult to alter your data model later on. I find that retrofitting a new data model to an existing codebase is the most expensive type of maintenance.
QA does not mean kicking the tires on a new feature. It doesn’t mean going through the main workflow, using only obvious valid inputs. QA is about trying to break the system in a thorough, predetermined, methodical way.
First, you should create test cases with a proper template, such as:
Test Case ID: A numeric value, which will be referenced by the test plan document. It could be the ID of the task in Jira.
Title: A single descriptive sentence, usually prefixed by the component it tests, like Login or Purchase.
Description: A short paragraph giving context about the feature you’re testing and how it should behave.
Preconditions: Things that must be done to setup the test case, like “Must be logged in as administrator”
Test Steps: The detailed steps to execute the test case. They should be written as clearly and concisely as possible, and should be “dummy-proof”, meaning anyone can execute them without any prior knowledge of the system.
Expected Results: What you should expect to see if the test passes. Again, it should be clearly written with zero ambiguity. If the expected results aren’t met, the test fails.
As you create new test cases, you should capture them in a test plan document. This is typically a spreadsheet that lists all the test cases which should be run with each QA cycle. I use a master template which I duplicate for every release. As we run through each test case, we set a Pass/Fail value on the sheet, and link to a bug report in the case of a failure. As a part of our deployment process, we make sure QA has delivered a completed test plan prior to release.
In the test plan, I specify whether a test case is applicable for a smoke test, a full regression test, and/or post-deployment validation. I also say whether the test has been fully automated. I also include a link to a follow-up bug task in case the test fails.
Manual deployments introduce an incredible amount of risk. If your developers, who are likely under pressure to release on time, forget one step in a long deployment checklist, it can wreak havoc. They’ve got to ensure the correct code branch is deployed, database schema changes are promoted, reference data are deployed, and any new infrastructure changes are deployed consistently across the environment. There is too much room for error.
All of your deployments and resource provisioning should be scripted and automated. Use tools like Chef, TeamCity, Jenkins, or one of the many other continuous deployment tools available. Use Docker, Heroku, Kubernetes, or Vagrant to automatically provision and scale your server infrastructure. For test automation, use Selenium, Mocha+Chai, phpUnit, or the analog for your language of choice.
When something does go wrong (and believe me, it will), how will you know? Will you wait for your clients to call and complain? If you get hacked or DDoSed, would you like to know right away, or after your platform is crushed?
You should setup robust application monitoring and alerting for your infrastructure resources, URLs and API endpoints, and error logs. For error logs, I am a big fan of the Elasticsearch/Logstash/Kibana (ELK) stack. If you’re on AWS, you can set this up as a turn-key service. This stack allows you to post logs in JSON format with tags and values, which you can then search and filter in Kibana. You can create dashboards, reports, and alerts based on log data. No more will you have to ssh into servers and scan huge text files to find out what’s wrong with your system.
For endpoint testing, I use Runscope and NewRelic. You can create simple tests to hit your key URLs and APIs and parse responses. You can then create alerts that will tell you if your system is not responding, or returning errors.
Security is the great bogeyman of software development. No one can ever be perfectly secure, and investing in security does hinder agility. It’s an inconvenience, but a necessary one. However, I’ve seen companies that completely disregard security, and can only be persuaded to invest in security with a carrot or a stick. Either they have to be HIPAA or PCI DSS compliant to avoid fines, or they can sell enhanced security as a differentiating factor.
I approach security as a set of business risks. Security risks largely fall into three categories: system availability, data loss, or data exposure. I create a risk register for my clients that include all the things that could go wrong given their level of security, and the impact it would have on the business in each of these three areas. The business then prioritizes the risk, and we work to plug the holes. I always refer to the OWASP Top 10 guidelines for secure coding standards, and the PCI DSS standards for infrastructure, database, and physical security.
Ahh, documentation, the bane of every coder’s existence, right? We don’t want to create documentation that sits on a shelf, or becomes obsolete as soon as it’s written. We don’t have time to write docs, we need to push more features!
When your company grows, and you start hiring new people, how will you train them? How will they discover how the system works? How will they know what will break if they change some logic? If you don’t invest in documentation, you will have to use the time of your other developers to train every new person who joins the team. If you invest in documentation once, you can onboard many new people with the same investment.
I’m a champion of agility in all aspects of business, and documentation is no exception. I’m a big fan of wikis like Confluence for capturing bare-bones documentation that can be easily updated and referenced. Do anything you can to capture the information. Take screenshots, record videos, screen captures, whatever it takes to get it down so others can refer to it later. Don’t have time to develop sophisticated architecture diagrams? Sketch them on a whiteboard and take a picture with your phone.
Good enough! (Credit Agilemodeling.com)
All platforms need maintenance. You may have scheduled tasks that fail, data cleanup tasks, and user support tasks that all need to read/write to the database. Any repetitive operational support tasks should be automated with a simple software tool. Most companies, however, avoid the cost of building the tools, but end up spending much more in the long run by pulling developers away from development to support the system.
Early in a company’s life cycle, you can get away with avoiding a lot of this overhead. You can scrap together an MVP and get to market quickly. You can scale to a certain point before the seams start to tear. Eventually though, you will have to get with the program and start paying the true cost of software development. The longer you wait, the more expensive it will be to retrofit.
How much of this do you think applies to you? Come on, you know it, I know it, we both know it. Don’t let this be you:
Credit interwebs
I’ve built my practice around helping growing tech companies mature their product development process. It may seem daunting, but trust me, I can help you start making small, iterative, meaningful progress in the right direction. If you think you’re over-burdened with technical debt, reach out to me on my website and let’s get you on the right path!
If you’d like to read more of my articles on product development, leadership, and software development process, follow me on Medium. You can also find my conference talks and podcasts on my media page.