paint-brush
How Shopify Uses BugSnag to Service Over 377,000 Online Storesby@bugsnag
452 reads
452 reads

How Shopify Uses BugSnag to Service Over 377,000 Online Stores

by BugsnagJune 16th, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Reliability is Shopify’s top priority for its 300 thousand plus daily users. Without the ability to gauge the impact of errors, Shopify engineers had limited visibility into user experience. Without spike notifications, alerts that indicate a sudden increase in the occurrence of an error, bugs would escalate and affect customers, requiring them to reach out to Shopify.
featured image - How Shopify Uses BugSnag to Service Over 377,000 Online Stores
Bugsnag HackerNoon profile picture

Challenge: Reliability is Shopify’s top priority for its 300 thousand-plus daily users. Without the ability to gauge the impact of errors, Shopify engineers had limited visibility into user experience, and oftentimes, customers contacted them with problems before errors were discovered.


Results: Using Bugsnag, Shopify engineers are aware of errors significantly faster, can more easily replicate errors, and can proactively see the effects of their code. Trend analysis helps engineers understand common causes of errors and mitigate future problems.


Before error monitoring, ensuring reliability for Shopify’s worldwide customer base was tedious, time-consuming, and inefficient.


Shopify develops e-commerce software for more than 377,500 online stores and retail point-of-sale systems around the world. For Shopify’s engineering team, the top priority is to maintain reliability for each customer worldwide.


However, that process used to be very time consuming and tedious for Shopify’s software engineers.


“Troubleshooting used to be slow and unpleasant, requiring us to invest a lot of time digging through logs and piecing together information from multiple sources,” states Blake Mesdag, Sr. Developer for Shopify’s CI Infrastructure team.


Trend analysis is another great way to mitigate potential exceptions, but before error monitoring was implemented, it was difficult to build a complete picture of errors over time.


Without spike notifications, alerts that indicate a sudden increase in the occurrence of an error, bugs would escalate and affect customers, requiring them to reach out to Shopify so they could be fixed.



Preempting Support Calls With Increased Visibility Into Errors

It soon became apparent that a more proactive way of addressing errors had to be implemented for all of Shopify’s development process.


“Without error monitoring, you’re flying blind - there’s almost no point if you can’t see what you’re doing right and wrong, and you can’t make any informed decisions,” says Mesdag.


After considering several options, Shopify chose Bugsnag for automatic error monitoring of their entire tech stack. Shopify’s developers, across 20-30 teams, can now proactively see the effects of their code and identify and troubleshoot errors before they impact merchants.


“Overall, Bugsnag has helped me become more confident in the code I ship, which in turn helps me ship faster”

— Blake Mesdag, Sr. Developer


Before Bugsnag, Shopify software engineers could not proactively see what errors were occurring and would have to wait for customers to reach out with problems. They would then have to dig through logs to find an exception ID, and then use that to find it in the exception tracker.


“Bugsnag surfaces all of this information in one place, and it tells us exactly what caused the error,” states Mesdag. “There is a lot less search involved with Bugsnag, and the overall process of troubleshooting bugs is significantly more pleasant.”



Using Trend Analysis to Mitigate Future Errors

With Bugsnag, Shopify developers can also analyze trends on exceptions. This helps them understand common factors that cause failures so they can take a proactive approach to mitigate these in the future.


Trend analysis is powerful for the infrastructure team to diagnose if specific machines are causing more issues than others. In addition, by having spike notifications, it becomes much easier to pinpoint errors that are quickly escalating, prioritize them, and get them fixed much faster.