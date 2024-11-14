Is a Green Build and Fixing Flaky Tests Your #1 Priority?

“Over my career, I’ve been in maybe 50 software development teams as an IC. In my current role, I get exposed to 10,000+. If I picked a random person across my entire sample set, they wouldn’t even know where to start with ‘doing the right thing.’ I see way more disengaged engineers who have a ‘give me a ticket and I’ll give you code’ mindset. How would you go about rebooting that?”





It’s a great question. I thought I’d write about how we’ve approached this in my last two companies. It’s easy to say that culture is the answer, but how do you build that culture? How do you get every member of the team to care about doing the right thing — and what is the right thing?

So Many Priority #1s

With so much emphasis on features and delivery, it’s no wonder engineers feel torn between addressing quality and stability issues or pushing out new features. So, how do you approach this? While the question of priority is nuanced and will vary from company to company, here’s a list of priorities based on my recent experiences.

1. Build a Culture of Ownership and Accountability

Building a culture of ownership takes time. Like any change, it requires repetition and reinforcement. Start immediately, as this will form the foundation for the following priorities.

2. Record, Prioritize, and Monitor Production Bugs and Defects

It’s surprising how often teams fail to record and prioritize production issues. At UberCarshare, we had a successful product but no way to track bug count or severity. Once you record your production issues, you can start measuring and reporting them. This opens up meaningful conversations with stakeholders about the trade-offs between quality, stability, and new features.

3. Complete Root Cause Analysis (RCA) or Incident Reports for Significant Production Issues

Tech debt is the root cause of many issues, and while we can’t fix everything immediately, completing RCAs and prioritizing follow-ups against other work is a major step toward prioritizing stability and quality over new features.

4. Establish a Continuous Green Build in CI

Our approach was as follows:

Get the build consistently green and fast enough — exclude flaky tests as necessary! Address performance or timeout issues through sufficient resourcing and parallelization of tests.

Maintain zero tolerance for flaky tests: if tests become flaky, exclude them from the build until they are fixed

5. Monitoring System Health and Actually Caring About Your Logs

Agree on your tolerance thresholds with the team, ensure that alerts for exceeded thresholds are sent to a public channel, and set an expectation that no one in the team is allowed to walk by a problem. At Carshare, we established an expectation that no one should push an alert further into history by adding a new comment below without first ensuring someone was investigating the issue.

Use These Principles to Drive a Culture of Autonomy and Accountability

Above all, reassure them that fixing alerts, broken builds, or bugs really is their highest priority. Eventually, these issues will start to solve themselves, and productivity and output improvements will follow the noticeable improvement in quality and stability.

Rinse and Repeat: The Path to Continuous Improvement

writing efficient tests

improving build speed

reducing handoffs

shifting quality left

improved collaboration with partners such as product and design teams

measuring quality, uptime, and cycle time

