Service Level Objectives (SLOs) are powerful decision-making tools way beyond the team coalface while providing value there. SLOs as Code in Reliably - the reliability automation platform for developers; provide executable, versionable artifacts that help you capture, frame, collaborate, and enable essential reliability conversations at any point in a system’s evolution. Why are SLOs so powerful? I have a confession; I love . In my experience, SLOs have risen to be one of most important parts of adoption. Time and again, I’ve seen huge value in having SLOs even if you are planning to apply all the aspects of SRE. Service Level Objectives (SLOs) the Site Reliability Engineering (SRE) not SLOs tell us what we care about and what good looks like for a system’s users. For this reason, SLOs can be incredible decision-making tools way beyond the team coalface (while providing value at the coalface as well!). While Service Level Indicators (SLIs) tell you what can be measured; SLOs tell you what matters (primarily - what matters to the system’s users). This is why SLOs are the first concept that has been defined in code as part of . In this article, I’m going to talk about why “SLOs as Code” is such an important step on our journey towards “Reliability as Code” (#reliabilityascode). Reliably, the new reliability toolkit for developers SLOs are valuable Conversation Enablers Firstly, SLOs are conversation starters. Even before one line of code has been written, it’s possible to talk about how facets of the future system should behave to deliver the right reliability experience to the system’s future users. great Many systems die in early implementation because reliability is an afterthought. Still, by bringing the SLO conversation early to the forefront, everyone gets an opportunity to collaborate. Even more importantly, SLOs help in understanding what the users will care about and how reliable the system needs to be. It doesn't mean that SLOs only enable valuable conversations for new, greenfield systems. SLOs can encourage the same conversations for pretty much any system, whether it be a greenfield or a slightly muddy “heritage” system (I prefer “heritage” to legacy, as for some reason legacy systems are something we look down on sometimes). SLOs can encourage everyone involved to ask, “What do we care about?”, “What’s the right level of reliability we need?”, “What does reliable look like to our users?” or even, “How do we balance cost and reliability?”. Regardless of the time these SLO conversations happen, they can add huge value by bringing reliability to the top table in the architecture and design process. Reliably’s SLO code artifact captures, frames, and supports these conversations. Using the SLOs artifact, you can develop and evolve your SLOs, even before you have any means of measuring those SLOs for real with Service Level Indicators (SLIs): services: - name: website service-levels: - name: 95 th of requests response time under 100 ms type: latency criteria: threshold: 100 ms sli: [] slo: 95 sli: [] window: PT1H - name: 99 th of requests response time under 500 ms type: latency criteria: threshold: 500 ms slo: 99 sli: [] window: PT1H - name: 99 th of requests responses not 5 xx type: availability slo: 99 sli: [] window: PT1H In the above code snippet, we’ve described three SLOs for simple website service. NOTE: You can create your own SLO definitions using the Reliably SLO init command. More information is available in the . Reliably docs Freeing the SLOs and Treating SLOs as Code SLOs are frequently defined and captured in monitoring and observability tools on the market. There’s nothing wrong with this. It just often means that the SLOs are not as visible to all the different collaborators involved as they could be, especially across an organization where there may be different monitoring and observability systems in play. It’s also common for SLOs to be subjected to a lifecycle that includes versioning, releasing while open for collaboration. Sound familiar? It does! This is the exact set of requirements we have for working with code generally, and so this is another reason why Reliably has codified SLOs as code artifacts that can be created, managed, versioned, and collaborated on using the same (or similar) processes you use for working with other system-critical artifacts. Executable SLOs as Code Over time you can enrich your SLOs with Service Level Indicators (SLIs), as shown in the snippet: services: - name: website service-levels: - name: 95 th of requests response time under 100 ms type: latency criteria: threshold: 100 ms slo: 95 sli: - id: myprojectid/google-cloud-load-balancers/myloadbalancer-name provider: gcp window: PT24H - name: 99 th of requests response time under 500 ms type: latency criteria: threshold: 500 ms slo: 99 sli: - id: myprojectid/google-cloud-load-balancers/myloadbalancer-name provider: gcp window: PT24H - name: 99 th of requests responses not 5 xx type: availability slo: 99 sli: - id: myprojectid/google-cloud-load-balancers/myloadbalancer-name provider: gcp window: P7D SLIs are measurements that, collected over a given window, give you “good” and “bad” events that roll up into the overall calculation of whether the SLO is still being met, is trending dangerously close to not being completed, or has been broken completely. SLOs, coded using Reliably and eventually including some SLIs, can be reported against at any time and by anyone with the permissions, using the SLO report command: $ reliably slo report You can even your SLOs with live updates using the  --watch switch: watch $ reliably slo report --watch There’s much more to dig into with the reliably SLO report command, . check out the docs for more Summary In this article, I’ve shared why SLOs are a powerful concept in SRE and beyond. SLOs provide a crucial conversation enabler regarding what matters in terms of reliability in a given system. This is why they are the first concept captured in code using Reliably as part of our #ReliabilityAsCode mission. Also published . here

How to Define Service Level Objectives as Code to Enhance SRE

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10x Rockstar Ninja Wizard Vampires

10 Prioritization Techniques for Agile Product Development

10 Online Free Courses That Can Help to Learn Agile Development

140 Stories To Learn About Agile Software Development

4 Ways to Manage Remote Teams

5 Acceptance Criteria Mistakes Teams Should Avoid

10x Rockstar Ninja Wizard Vampires

10 Prioritization Techniques for Agile Product Development

10 Online Free Courses That Can Help to Learn Agile Development

140 Stories To Learn About Agile Software Development

4 Ways to Manage Remote Teams

5 Acceptance Criteria Mistakes Teams Should Avoid

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps