491 reads

Why OpenTelemetry Should Matter to Network and Systems Admins

by Leon AdatoFebruary 18th, 2022

Too Long; Didn't Read

"The network" isn't just YOUR network any more, it's the cloud-based service you use. The need for monitoring, observing, or grokking, is the same, and OpenTelemetry is the industry's answer to that need. To maintain a solid grip on the environment, you need metrics (however you collect them); events (regardless of the layer of the OSI they travel on); logs (individual or aggregated) and traces (yes, traces) i.e. "MELT"

Company Mentioned

featured image - Why OpenTelemetry Should Matter to Network and Systems Admins

OpenTelemetry is young, even by internet standards. Born out of the merger of OpenTracing and OpenCensus projects at the Cloud Native Computing Foundation (CNCF), its lofty goal was (and still is) to put an "open" spin on observability — philosophically the ability to understand the internal state of a system from its externally consumable outputs, but functionally the insight that can be captured and aggregated from metrics, events, logs, and traces (MELT).

But for IT practitioners whose day-to-day job is to implement, care for, and feed infrastructure, the question in all this is "why should I care?"

"The network" isn't just YOUR network anymore

As a fellow old-school sysadmin and network engineer, I hate to be the one to break the news to you, but it's probably better coming from a colleague, if not a friend.

Sure, all the on-prem stuff you've lovingly purchased and provisioned, racked and stacked, cabled and configured — the technical world you know and love — isn’t going anywhere. But gone are the days that "the network" ends at the demarc.

It probably comes as no surprise that your sphere of responsibility extends to the cloud now, too — from the networking that connects your office to AWS, Azure, GCP and beyond to the custom containers, systems, and services your business runs there.

Furthermore, "the network" now includes, from a functional standpoint, the network of any cloud-based service you use. Whether it's a custom application that includes a call to an external API, or a full-blown business-critical software-as-a-service, from your user's perspective, that's "the network" too.

And it's all yours. And since it's all yours, you need to have a way of understanding it. It doesn't matter whether you want to call that monitoring, observing, or even grokking — the need remains the same. OpenTelemetry is the industry's answer to that need.

Your need to know goes beyond the basics

Understanding modern infrastructure — which includes servers and routers and wireless but extends from there into containers and cloud and beyond — requires modern techniques.

SNMP, WMI, and even johnny-come-lately techniques like Netflow aren't enough. To maintain a solid grip on the environment, you need metrics (however you collect them); events (regardless of the layer of the OSI they travel on); logs (individual or aggregated and in all their many-faceted layouts and formats); and traces (yes, traces — they're not just for devs any more). i.e. "MELT".

While I'm sure you're probably already thinking about the different tools you could use to collect each of those, I'm equally sure you already know that throwing each of those tools onto a set of screens and performing non-stop swivel-chair integration isn't going to cut it. You need a cohesive framework able to both ingest those disparate data types and also normalize them so they can be coherently displayed. Once again, that's OpenTelemetry.

It's never the network, but it's still your problem

It's just us old warhorses here, right? So I'll be honest. Sometimes it is the network. But not often, and not for long. But just because you can use some built-in-the-90's tool to prove it wasn't your box doesn't mean you can drop off that sev1 call. Besides, for some of you reading this, there'd be nobody to hand it off to in any case because you're essentially an army of one.

But even for those working in large distributed teams, the days of siloed responsibility are long gone (and good riddance, in my opinion). The person that matters in this equation is the consumer, whether that's a colleague within the company or a customer hitting the site or service from the outside.

And they don't experience "the network" as separate from "the server," which is itself separate from "the database," not to mention "the software."

It's all one unified experience to them, and your stuff — regardless which piece that is — is part of the whole. As long as there's a problem, it's very likely you're expected to stick around and fix it. Not only that, but you probably have to prove it's fixed, and go the extra step to explain how you'll protect against it breaking again in the future.

Why the extra pressure? Because the experience is what customers associate with your business now. At CiscoLive Europe 2020, Cisco VP/GM Daniel Winokur said “…applications over the past decade have moved from this role that they used to have of supporting our business to now playing a role where they actually are our business.” Taking it a step further, Winokur pointed out how loyalty to the application and experience has replaced loyalty to a brand.

This reality, more than any other, moves IT from being a function that supports the business to one which is central to the success of the business overall. And the thing that will provide the sum-of-all-parts understanding, which is in itself a key differentiator, is OpenTelemetry.