How to Develop a Bug Triage Process Efficiently

Written by bugsnag | Published 2022/02/08
Tech Story Tags: bugsnag | software-development | bugs | bug-triaging | bug-triage | bug-triage-process | software-bugs | good-company

TLDRThis is the third video in a three-part series on how to approach software bug triage. We've covered several best practices for triaging effectively and efficiently within Bugsnag. In order to deliver the best possible software to your users, you're going to need to find a way to incorporate these best practices into your overall engineering process. The third video will focus on developing a bug-triage process with your team. We'll talk about how to develop a process within your engineering team.via the TL;DR App

So far we've covered several best practices for triaging effectively and efficiently within Bugsnag. But just knowing about these best practices isn't necessarily going to be enough to deliver the best possible software to your users. In order to do that, you're going to need to find a way to incorporate these best practices into your overall engineering process. So let's talk about some ways to accomplish that. This is the third video in a three-part series on how to approach software bug triage.

Transcript

00:15 so far we've covered several best
00:17 practices for triaging effectively and
00:19 efficiently within bug snag but just
00:22 knowing about these best practices
00:24 isn't necessarily going to be enough to
00:26 deliver the best possible software
00:29 to your users in order to do that you're
00:32 going to need to find a way to
00:33 incorporate these best practices
00:35 into your overall engineering process so
00:38 let's talk about some ways to accomplish
00:40 that
00:42 it's really important to make sure that
00:44 consistent and accurate bug triage
00:46 is a performance target for your team
00:49 so tech leads engineering managers and
00:52 engineering leaders of all kinds
00:55 this one's for you each of your bug
00:57 snack projects and generally each one of
00:59 your code bases
01:00 should have an owner someone on the team
01:02 who's ultimately accountable
01:04 for the health and stability of that
01:06 given project
01:08 now they're also accountable for making
01:10 sure that their teams are following a
01:12 process
01:13 by which bugs are being regularly
01:14 triaged and also accurately triage right
01:18 there are lots of workflow actions that
01:19 you can take in bug snag
01:21 so it's one thing to hit inbox zero and
01:23 it's
01:24 another degree of engineering excellence
01:27 to be making the right triage and calls
01:30 time and time again so often the person
01:33 tasked with this is going to be someone
01:34 in an engineering leadership role but
01:37 they don't necessarily need to be the
01:38 person
01:39 doing the daily triaging work all the
01:41 time or fixing the bugs themselves
01:43 they're the one who is responsible for
01:46 creating a culture of regular triaging
01:50 and accurate bug remediation within
01:52 their team so again
01:54 make it a performance target make sure
01:55 that you're tracking the decree to which
01:57 bugs are being triaged
01:59 and the degree to which good triaging
02:01 decisions are being made
02:04 when it comes to developing a triaging
02:06 process within your engineering team
02:09 the fundamental question to be asking is
02:12 who
02:12 on the team is responsible for triaging
02:14 bugs today
02:16 now there are all kinds of valid answers
02:18 to this question
02:19 and what those answers look like depends
02:21 a lot on how your team is structured
02:23 how responsibilities are spread
02:25 throughout the team
02:26 and so on but some ideas to get you
02:30 started
02:31 things we've seen work really well is
02:33 coming up with a daily or
02:34 weekly rotation where an engineer on the
02:37 team is responsible
02:39 for handling periodic triage during that
02:41 work day or during that week
02:44 and by doing this it should be clear to
02:46 everybody on the team
02:48 who's responsible for looking at bugs
02:50 today who's responsible for doing
02:51 periodic triage today
02:53 but however you decide to answer this
02:56 question it's just
02:57 really important to check in with your
02:59 team and make sure that you have a good
03:01answer that you all buy into
03:03 that it's clear to everyone who's
03:05 responsible for what
03:06 as it relates to triaging bugs in your
03:09 system
03:11 so once you've thought about your
03:13 triaging rotation
03:15 how you spread the responsibility of
03:16 doing periodic daily bug triage across
03:19 your team
03:20 it's also important to think about how
03:21 you want to handle the situation where
03:24 someone on the team ships a new feature
03:26 and there's a bug that's newly
03:28 introduced in that code that that
03:30 engineer just sent out what if that
03:32 engineer isn't the one who is on
03:34 the triage rotation for the current day
03:36 how do you handle that
03:38 again as with the last point there are
03:40 lots of valid answers to this question
03:42 it really depends on
03:43 how you distribute responsibilities
03:46 throughout your team how you deliver
03:47 software and so on
03:48 but one thing that we do at bug snag and
03:52 we
03:53 have had a lot of success with this is
03:55 to make it the responsibility of the
03:57 engineer who is deploying new code to
04:00 perform any necessary reactive triage
04:03 which means addressing any bugs
04:05 introduced in their new code
04:07 and this tends to make a lot of sense
04:09 because it's usually true that the
04:10 person shipping the new code has
04:12 the most context on what that code is
04:15 trying to do how that code works how
04:16 that code fits into the larger system
04:19 and it doesn't necessarily make sense
04:20 for the person who is broadly
04:22 dealing with reactive triage of bugs
04:24 across the entire system
04:26 to look at that newly introduced bug
04:30 usually what you'll end up having is the
04:32 person who's doing the periodic triage
04:34 they'll be looking at bugs that are
04:36 newly introduced from
04:38 edge cases in legacy code or from things
04:41 that were previously snoozed that have
04:43 re-entered the four review state or
04:45 things that were fixed
04:47 and have happened again in new versions
04:49 of the code
04:51but think about the distinction between
04:54 that set of responsibilities and who
04:56 looks at bugs coming from
04:58 code you've just released again
05:01lots of valid answers to this question05:03but think about it
05:05 what's going to work best for your team
05:09 so before we get into the live q a i
05:11want to address
05:12one question we sometimes hear at this
05:15 point
05:16 and that is yeah this all sounds really
05:18 great this
05:19 daily triaging this getting to inbox
05:22 zero having an inbox that has
05:25 a high signal to noise ratio etc
05:29 allowing us to be efficient with our
05:30 triaging process that all sounds
05:31wonderful
05:32 but we haven't been doing a great job of
05:35 triaging our bugs historically
05:36 and they've started piling up so we've
05:39 got a huge set of errors in our inbox
05:41and it just feels daunting to start
05:43 reviewing them what can we do
05:45 to get things back on track so
05:49 if your team falls into this category if
05:52 this describes you
05:53 first of all i'd say for most teams with
05:56 a little bit
05:57 of focus it's possible to
06:00 plow through that backlog and get down
06:02 to inbox zero
06:04 and get things on track but if that's
06:07 still too much to take on
06:08 we've got a few best practices a few
06:10 tips really
06:12 to help you get your inbox under control
06:13 so you can get back on track
06:15 and start doing the recommended
06:18 daily triage workflow earlier we talked
06:21about the case of
06:22 a monolithic code base and how using a
06:25 shared bookmark
06:26 could give a team a refined view
06:30 down to the set of full review errors
06:32 that applies just to their subset
06:34 or subsection of a monolithic code base
06:37 you can apply this same principle
06:39 if you're trying to get your set of
06:42 four review errors down to a more
06:43 manageable set
06:45 you can essentially create a filter
06:49 that reflects the highest priority or
06:52 the likely
06:53 most impactful bugs in your for review
06:56 set
06:56 and you can start triaging these bugs
06:59 down first
07:00 and then over time you can make your for
07:03 review filter
07:05 more permissive to include a larger and
07:09 larger set of the total for review bugs
07:11and we'll take a look at what this would
07:14 look like inside the dashboard
07:16 okay so here we are looking at the inbox
07:19 for a project that has 31 forward view
07:21errors
07:22 you can see that right here and
07:25 let's say this is a few errors more than
07:28 it feels realistic for us to
07:32 get a handle on in terms of daily
07:33 triaging and we'd like to
07:35 follow this recommendation of starting
07:37 with a scope down
07:39 version of just the most high impact for
07:42review errors
07:44 what we can do is we can start to
07:46 consider
07:47 some filters we can apply to zero in on
07:50 the
07:50 highest impact or highest priority of
07:53 these 31 errors
07:55 so first of all let's go ahead and limit
07:58 this search
07:59 to only those errors affecting the
08:01production stage
08:05 so we'll apply that we say okay we've
08:07 gone down to 29 for review errors once
08:08 we apply that stage filter
08:10 let's continue and scope this down
08:14 instead of considering
08:15 error warning and info level severities
08:18 let's just consider
08:19 error and for the type rather than
08:23 including handled and unhandled errors
08:25 let's only look at
08:27 unhandled errors and as a point of
08:30 review
08:30 unhandled errors are errors that bug
08:32 snack automatically detects
08:34 and reports handled errors are
08:37 errors that you manually notify bug snag
08:40 about
08:41so we'll apply these filters that's
08:44 taken it down to 11 for review errors
08:47 and rather than considering all time
08:50 let's start by just looking at errors
08:52 that have happened in the past
08:54 week so we apply that and now we're down
08:56 to
08:57 10 for review errors hopefully a much
09:00 more manageable number than the previous
09:02 31
09:02 so from here we can start our triaging
09:05 workflow but before we do that
09:07 let's save this search
09:10 so we can bookmark the current search
09:13 and we can call this
09:16 priority for review now the two
09:19 options here set as my default view for
09:21the project
09:23 let's go ahead and click that that means
09:25 anytime you
09:27 go to this project in bug snag these
09:29 filters will be automatically applied
09:32 by default and then let's also click
09:35 share this bookmark with my team
09:37 this means that everyone on your bug
09:39 snack team will also have access to this
09:41filter from their inbox
09:42 and be able to triage down the same
09:46 set of high impact for review bugs
09:50 we can see that our bookmark was saved
09:52 it's highlighted to indicate that we're
09:53 currently searching by this
09:55 this bookmark now anyone on our team can
09:58 click this
09:58 and get to the same filtered view of
10:00 their inbox and from here we can all
10:03 proceed with the triaging workflow we
10:05 already discussed
10:09 okay let's say a few days have gone by
10:11 now the team has been consistently using
10:13 this priority for review bookmark
10:16 to do daily triaging using this
10:19 scope down set of full review errors
10:22 we've been consistently hitting inbox
10:24 zero on a daily basis
10:26 now it's time to expand this saved
10:29 bookmark
10:30 to include a larger set of the
10:33 total for review errors so in order to
10:36 do that
10:37 let's just modify some of these filters
10:39 so let's say
10:40 instead of just considering production
10:42 errors let's include staging errors as
10:43 well
10:45 let's include warning level
10:49 uh severities apply that as well okay
10:51now we're up to seven
10:53 and instead of just considering the past
10:55 seven days of history let's also look at
10:57 any bug that's occurred in the last 30
10:59 days what we can do here is go back to
11:02 this bookmark that we've already created
11:04 and saved with the team
11:05 and say update with current filters
11:09 update shared bookmark this is telling
11:10 us that this is a shared bookmark so
11:13 everyone on the team
11:14 will be able to use this updated version
11:16 of it we'll click
11:17 update so there we go
11:20 now we have a new and improved priority
11:23 for view filter that includes
11:25 a larger set of the total four review
11:27 errors
11:28 and following this process over time we
11:30 can get to a place where
11:32 eventually we don't need this special
11:33 bookmark anymore we can just use
11:36 plain old for review filter that
11:39 includes
11:40 the global set of four review errors for
11:43 this project
11:44 so again creating a custom bookmark
11:48 that zeroes in on the highest impact of
11:50 your four review errors
11:52 is a great way to iteratively work your
11:54 way toward triaging down
11:56 the full set of four review bucks in
11:58 some rare cases
12:00 the suggestions that we've covered so
12:02 far aren't going to go
12:04 far enough and in these cases what you
12:07 can choose to do is to
12:10 mark every error in your inbox as fixed
12:14 now this is a fairly aggressive tactic
12:17 but the effect that this will have
12:19 is that any bug that is continuing to be
12:22 ongoing
12:24 will re-enter the for review state as it
12:27 occurs
12:28 in a future release of your software and
12:32 this should give you enough breathing
12:34 room to
12:35 jump start your daily periodic triage
12:38 process
12:40 rather than trying to start that process
12:42 with
12:43 hundreds or thousands of errors to
12:45 review
12:46 you can temporarily get down to inbox
12:49 zero
12:50 and get on to a triaging workflow
12:54 when the number of errors to review is
12:56 is relatively low
12:59 so again this isn't something to do
13:01frequently
13:02 but it can be very effective in the rare
13:04 cases where it's called
13:06 for so i hope you found these
13:09 recommendations helpful
13:10 and i hope that they've given you some
13:12 things to reflect on as you consider
13:14 your team's approach to
13:16 error triaging and your wider
13:18 engineering process
13:19 our ultimate goal here is to help you
13:21focus on the most impactful work at the
13:23 current moment
13:24 whether that means fixing your most
13:26 critical bug or building your next big
13:32 feature
13:41 you

Written by bugsnag | The leading application stability management solution trusted by over 6,000 engineering teams worldwide.
Published by HackerNoon on 2022/02/08