paint-brush
How to Develop a Bug Triage Process Efficientlyby@bugsnag
421 reads
421 reads

How to Develop a Bug Triage Process Efficiently

by BugsnagFebruary 8th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This is the third video in a three-part series on how to approach software bug triage. We've covered several best practices for triaging effectively and efficiently within Bugsnag. In order to deliver the best possible software to your users, you're going to need to find a way to incorporate these best practices into your overall engineering process. The third video will focus on developing a bug-triage process with your team. We'll talk about how to develop a process within your engineering team.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How to Develop a Bug Triage Process Efficiently
Bugsnag HackerNoon profile picture

So far we've covered several best practices for triaging effectively and efficiently within Bugsnag. But just knowing about these best practices isn't necessarily going to be enough to deliver the best possible software to your users. In order to do that, you're going to need to find a way to incorporate these best practices into your overall engineering process. So let's talk about some ways to accomplish that. This is the third video in a three-part series on how to approach software bug triage.

Transcript

00:15 so far we've covered several best

00:17 practices for triaging effectively and

00:19 efficiently within bug snag but just

00:22 knowing about these best practices

00:24 isn't necessarily going to be enough to

00:26 deliver the best possible software

00:29 to your users in order to do that you're

00:32 going to need to find a way to

00:33 incorporate these best practices

00:35 into your overall engineering process so

00:38 let's talk about some ways to accomplish

00:40 that

00:42 it's really important to make sure that

00:44 consistent and accurate bug triage

00:46 is a performance target for your team

00:49 so tech leads engineering managers and

00:52 engineering leaders of all kinds

00:55 this one's for you each of your bug

00:57 snack projects and generally each one of

00:59 your code bases

01:00 should have an owner someone on the team

01:02 who's ultimately accountable

01:04 for the health and stability of that

01:06 given project

01:08 now they're also accountable for making

01:10 sure that their teams are following a

01:12 process

01:13 by which bugs are being regularly

01:14 triaged and also accurately triage right

01:18 there are lots of workflow actions that

01:19 you can take in bug snag

01:21 so it's one thing to hit inbox zero and

01:23 it's

01:24 another degree of engineering excellence

01:27 to be making the right triage and calls

01:30 time and time again so often the person

01:33 tasked with this is going to be someone

01:34 in an engineering leadership role but

01:37 they don't necessarily need to be the

01:38 person

01:39 doing the daily triaging work all the

01:41 time or fixing the bugs themselves

01:43 they're the one who is responsible for

01:46 creating a culture of regular triaging

01:50 and accurate bug remediation within

01:52 their team so again

01:54 make it a performance target make sure

01:55 that you're tracking the decree to which

01:57 bugs are being triaged

01:59 and the degree to which good triaging

02:01 decisions are being made

02:04 when it comes to developing a triaging

02:06 process within your engineering team

02:09 the fundamental question to be asking is

02:12 who

02:12 on the team is responsible for triaging

02:14 bugs today

02:16 now there are all kinds of valid answers

02:18 to this question

02:19 and what those answers look like depends

02:21 a lot on how your team is structured

02:23 how responsibilities are spread

02:25 throughout the team

02:26 and so on but some ideas to get you

02:30 started

02:31 things we've seen work really well is

02:33 coming up with a daily or

02:34 weekly rotation where an engineer on the

02:37 team is responsible

02:39 for handling periodic triage during that

02:41 work day or during that week

02:44 and by doing this it should be clear to

02:46 everybody on the team

02:48 who's responsible for looking at bugs

02:50 today who's responsible for doing

02:51 periodic triage today

02:53 but however you decide to answer this

02:56 question it's just

02:57 really important to check in with your

02:59 team and make sure that you have a good

03:01answer that you all buy into

03:03 that it's clear to everyone who's

03:05 responsible for what

03:06 as it relates to triaging bugs in your

03:09 system

03:11 so once you've thought about your

03:13 triaging rotation

03:15 how you spread the responsibility of

03:16 doing periodic daily bug triage across

03:19 your team

03:20 it's also important to think about how

03:21 you want to handle the situation where

03:24 someone on the team ships a new feature

03:26 and there's a bug that's newly

03:28 introduced in that code that that

03:30 engineer just sent out what if that

03:32 engineer isn't the one who is on

03:34 the triage rotation for the current day

03:36 how do you handle that

03:38 again as with the last point there are

03:40 lots of valid answers to this question

03:42 it really depends on

03:43 how you distribute responsibilities

03:46 throughout your team how you deliver

03:47 software and so on

03:48 but one thing that we do at bug snag and

03:52 we

03:53 have had a lot of success with this is

03:55 to make it the responsibility of the

03:57 engineer who is deploying new code to

04:00 perform any necessary reactive triage

04:03 which means addressing any bugs

04:05 introduced in their new code

04:07 and this tends to make a lot of sense

04:09 because it's usually true that the

04:10 person shipping the new code has

04:12 the most context on what that code is

04:15 trying to do how that code works how

04:16 that code fits into the larger system

04:19 and it doesn't necessarily make sense

04:20 for the person who is broadly

04:22 dealing with reactive triage of bugs

04:24 across the entire system

04:26 to look at that newly introduced bug

04:30 usually what you'll end up having is the

04:32 person who's doing the periodic triage

04:34 they'll be looking at bugs that are

04:36 newly introduced from

04:38 edge cases in legacy code or from things

04:41 that were previously snoozed that have

04:43 re-entered the four review state or

04:45 things that were fixed

04:47 and have happened again in new versions

04:49 of the code

04:51but think about the distinction between

04:54 that set of responsibilities and who

04:56 looks at bugs coming from

04:58 code you've just released again

05:01lots of valid answers to this question05:03but think about it

05:05 what's going to work best for your team

05:09 so before we get into the live q a i

05:11want to address

05:12one question we sometimes hear at this

05:15 point

05:16 and that is yeah this all sounds really

05:18 great this

05:19 daily triaging this getting to inbox

05:22 zero having an inbox that has

05:25 a high signal to noise ratio etc

05:29 allowing us to be efficient with our

05:30 triaging process that all sounds

05:31wonderful

05:32 but we haven't been doing a great job of

05:35 triaging our bugs historically

05:36 and they've started piling up so we've

05:39 got a huge set of errors in our inbox

05:41and it just feels daunting to start

05:43 reviewing them what can we do

05:45 to get things back on track so

05:49 if your team falls into this category if

05:52 this describes you

05:53 first of all i'd say for most teams with

05:56 a little bit

05:57 of focus it's possible to

06:00 plow through that backlog and get down

06:02 to inbox zero

06:04 and get things on track but if that's

06:07 still too much to take on

06:08 we've got a few best practices a few

06:10 tips really

06:12 to help you get your inbox under control

06:13 so you can get back on track

06:15 and start doing the recommended

06:18 daily triage workflow earlier we talked

06:21about the case of

06:22 a monolithic code base and how using a

06:25 shared bookmark

06:26 could give a team a refined view

06:30 down to the set of full review errors

06:32 that applies just to their subset

06:34 or subsection of a monolithic code base

06:37 you can apply this same principle

06:39 if you're trying to get your set of

06:42 four review errors down to a more

06:43 manageable set

06:45 you can essentially create a filter

06:49 that reflects the highest priority or

06:52 the likely

06:53 most impactful bugs in your for review

06:56 set

06:56 and you can start triaging these bugs

06:59 down first

07:00 and then over time you can make your for

07:03 review filter

07:05 more permissive to include a larger and

07:09 larger set of the total for review bugs

07:11and we'll take a look at what this would

07:14 look like inside the dashboard

07:16 okay so here we are looking at the inbox

07:19 for a project that has 31 forward view

07:21errors

07:22 you can see that right here and

07:25 let's say this is a few errors more than

07:28 it feels realistic for us to

07:32 get a handle on in terms of daily

07:33 triaging and we'd like to

07:35 follow this recommendation of starting

07:37 with a scope down

07:39 version of just the most high impact for

07:42review errors

07:44 what we can do is we can start to

07:46 consider

07:47 some filters we can apply to zero in on

07:50 the

07:50 highest impact or highest priority of

07:53 these 31 errors

07:55 so first of all let's go ahead and limit

07:58 this search

07:59 to only those errors affecting the

08:01production stage

08:05 so we'll apply that we say okay we've

08:07 gone down to 29 for review errors once

08:08 we apply that stage filter

08:10 let's continue and scope this down

08:14 instead of considering

08:15 error warning and info level severities

08:18 let's just consider

08:19 error and for the type rather than

08:23 including handled and unhandled errors

08:25 let's only look at

08:27 unhandled errors and as a point of

08:30 review

08:30 unhandled errors are errors that bug

08:32 snack automatically detects

08:34 and reports handled errors are

08:37 errors that you manually notify bug snag

08:40 about

08:41so we'll apply these filters that's

08:44 taken it down to 11 for review errors

08:47 and rather than considering all time

08:50 let's start by just looking at errors

08:52 that have happened in the past

08:54 week so we apply that and now we're down

08:56 to

08:57 10 for review errors hopefully a much

09:00 more manageable number than the previous

09:02 31

09:02 so from here we can start our triaging

09:05 workflow but before we do that

09:07 let's save this search

09:10 so we can bookmark the current search

09:13 and we can call this

09:16 priority for review now the two

09:19 options here set as my default view for

09:21the project

09:23 let's go ahead and click that that means

09:25 anytime you

09:27 go to this project in bug snag these

09:29 filters will be automatically applied

09:32 by default and then let's also click

09:35 share this bookmark with my team

09:37 this means that everyone on your bug

09:39 snack team will also have access to this

09:41filter from their inbox

09:42 and be able to triage down the same

09:46 set of high impact for review bugs

09:50 we can see that our bookmark was saved

09:52 it's highlighted to indicate that we're

09:53 currently searching by this

09:55 this bookmark now anyone on our team can

09:58 click this

09:58 and get to the same filtered view of

10:00 their inbox and from here we can all

10:03 proceed with the triaging workflow we

10:05 already discussed

10:09 okay let's say a few days have gone by

10:11 now the team has been consistently using

10:13 this priority for review bookmark

10:16 to do daily triaging using this

10:19 scope down set of full review errors

10:22 we've been consistently hitting inbox

10:24 zero on a daily basis

10:26 now it's time to expand this saved

10:29 bookmark

10:30 to include a larger set of the

10:33 total for review errors so in order to

10:36 do that

10:37 let's just modify some of these filters

10:39 so let's say

10:40 instead of just considering production

10:42 errors let's include staging errors as

10:43 well

10:45 let's include warning level

10:49 uh severities apply that as well okay

10:51now we're up to seven

10:53 and instead of just considering the past

10:55 seven days of history let's also look at

10:57 any bug that's occurred in the last 30

10:59 days what we can do here is go back to

11:02 this bookmark that we've already created

11:04 and saved with the team

11:05 and say update with current filters

11:09 update shared bookmark this is telling

11:10 us that this is a shared bookmark so

11:13 everyone on the team

11:14 will be able to use this updated version

11:16 of it we'll click

11:17 update so there we go

11:20 now we have a new and improved priority

11:23 for view filter that includes

11:25 a larger set of the total four review

11:27 errors

11:28 and following this process over time we

11:30 can get to a place where

11:32 eventually we don't need this special

11:33 bookmark anymore we can just use

11:36 plain old for review filter that

11:39 includes

11:40 the global set of four review errors for

11:43 this project

11:44 so again creating a custom bookmark

11:48 that zeroes in on the highest impact of

11:50 your four review errors

11:52 is a great way to iteratively work your

11:54 way toward triaging down

11:56 the full set of four review bucks in

11:58 some rare cases

12:00 the suggestions that we've covered so

12:02 far aren't going to go

12:04 far enough and in these cases what you

12:07 can choose to do is to

12:10 mark every error in your inbox as fixed

12:14 now this is a fairly aggressive tactic

12:17 but the effect that this will have

12:19 is that any bug that is continuing to be

12:22 ongoing

12:24 will re-enter the for review state as it

12:27 occurs

12:28 in a future release of your software and

12:32 this should give you enough breathing

12:34 room to

12:35 jump start your daily periodic triage

12:38 process

12:40 rather than trying to start that process

12:42 with

12:43 hundreds or thousands of errors to

12:45 review

12:46 you can temporarily get down to inbox

12:49 zero

12:50 and get on to a triaging workflow

12:54 when the number of errors to review is

12:56 is relatively low

12:59 so again this isn't something to do

13:01frequently

13:02 but it can be very effective in the rare

13:04 cases where it's called

13:06 for so i hope you found these

13:09 recommendations helpful

13:10 and i hope that they've given you some

13:12 things to reflect on as you consider

13:14 your team's approach to

13:16 error triaging and your wider

13:18 engineering process

13:19 our ultimate goal here is to help you

13:21focus on the most impactful work at the

13:23 current moment

13:24 whether that means fixing your most

13:26 critical bug or building your next big

13:32 feature

13:41 you