Chaos has become a symptom of the tech world. Every day, thousands of developers are putting out fires at work and getting caught up in one crisis after another. The better part of those fires have been lit by the rise of and . The popularity of those advancements is at an all-time high, yet failures continue to be prominent and complex. microservices distributed cloud architectures Downtime Jitters According to an IHS Markit survey, the cost of downtime for 400 companies hit a collective per year. This is a staggering figure. $700 billion In March 2015, a 12-hour Apple store outage cost the company $ . 25M In May 2017, one outage stranded tens of thousands of British Airways (BA) passengers and resulted in a $102.19M loss. In December 2020, a large-scale outage took down YouTube, Gmail, and Google Assistant for around an hour. There were a lot of pocket holes that month. We all need a magic pill to alleviate this headache —waiting for your service to crash is a bleak option. Let’s do it the Netflix way and chill during deployment. Play Destroy Welcome to chaos engineering - a place where mistakes are intentional and failures are embraced. Its history dates back to 2010 when the Netflix Eng Tools team created Chaos Monkey to test the resilience of its IT infrastructure. Today, chaos engineering is ‘celebrating failure’ to help engineers and systems build muscle memory and maintain more resilient complex systems. Vaccinate Against Downtime In layman’s terms, chaos engineering is the process of things on purpose. hacking Just like a vaccination, you inject or to trigger an immune response within the system. latency CPU failure In this case, our main goal lies in identifying hidden problems that may wreck production. As a сhaos engineer, you test the system's ability to handle real-world problems - server errors, traffic jumps, corrupted messages - in a series of controlled experiments. Break Things Strategically To stress your system out, you need to follow a four-step process: Develop a profound understanding of a system so that you are aware of what it looks like during normal functioning. This state will serve as a measurable variable. Define the steady-state of the system. Choose the damaging action you want to enact. Simulate realistic scenarios. Replicate real-life problems that have previously occurred in your system. For example, if traffic spikes caused havoc a few months ago, opt for bugs that mimic those affects. Build a hypothesis around steady-state. Keep tabs on your system while the bug is attacking it. Focus on key metrics, but don’t forget to assess the entire system. Measure the impact. Safeguard the infrastructure by coordinating developer teams and business units. Furthermore, you should start small and build up as you gain confidence in a system. Minimize the blast radius. Finally, you’ll have one of the two outcomes. You either confirm the resilience of the system, or you find a weak point to eliminate. Invalidate your hypothesis. Pro tip: Run chaos experiments in production to replicate the real state of things. If you perform chaos testing during staging or integration, you won’t build a real vision of how the system in reacts. production Embrace the Art of Chaos Awesome! We’ve successfully shattered your application using controlled chaos and debunked the concept of chaos engineering. Next, you would want to right the wrongs to make your system invincible. Credit for the above piece goes to Tatsiana Isakova, Hang Ngo, and Ellen Stevens. Subscribe to ’s thematic newsletters via our subscribe form in the footer. HackerNoon

Apple

Google

Netflix

YouTube

Ghost the Internet Like It Is Your Ex

A Ring of Failure

Subscribe to Newsletters by signing up via our footer subscription form. 

Netflix and Smash

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Ring of Failure

104 Stories To Learn About Go

105 Stories To Learn About Functional Programming

100+ Free Pluralsight Courses to learn Python, Java, and Spring Boot

10 Websites to Learn JavaScript for Beginners

104 Stories To Learn About Programming Top Story

A Ring of Failure

104 Stories To Learn About Go

105 Stories To Learn About Functional Programming

100+ Free Pluralsight Courses to learn Python, Java, and Spring Boot

10 Websites to Learn JavaScript for Beginners

104 Stories To Learn About Programming Top Story

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps