Photo by on Vivaan Trivedii Unsplash APIs are like blood vessels to a digital business. As data flows through, energy is delivered to activate new opportunities. Oftentimes, we focus on specialized components, the vital organs of our software systems. What can we learn by tapping into the connectors, themselves, pulling insights from the streams? Here's a quick overview on bootstrapping an observability strategy for APIs. Discover the Signals Data lives everywhere. When it comes to measuring success, anticipating problems, or looking for our next opportunity, we instinctively scrape, scrub, polish, and analyze information to the best of our ability. Finding signals in the noise has been a natural activity for living beings since the dawn of vigilance. As time has progressed, we've applied these data-crunching instincts to our digital assets, as well. For modern business, this practice is one for survival. With so much of our business being driven by APIs, are we searching for the right signals? The acronym MELT defines our starting point. - Metrics, numeric measurements collected and tracked over time. M - Events, snapshots of significant state changes. E - Logs, a detailed transcript of system behavior. L - Traces, a route of interactions between components, coupled with an associated context. T The process of communicating and recording these signals is called . telemetry Further reading: MELT 101 Inspect the Plumbing Most interactions we have with APIs over the network are fairly high-level. We send a blob of JSON. We receive a blob of JSON. Profit! 💰 What signals can we acquire from what lies below? When was there a significant shift in traffic? How much time is spent receiving data from external sources? What's the trend in connection errors over time? The association of signal trackers to our systems is called . instrumentation When it comes to the lower-level components, we can often take advantage of instrumentation. This can surface in the form of wrapping components within a standard library or adding listeners along connection paths. automatic Examples: OpenTracing Node.js HTTP auto-instrumentation Envoy metrics, Istio Telemetry API Remember the Domain Today, we strive to capture more signals than ever before. We see both virtual machine metrics and application logs being shipped to storage and analysis tools. But what about all the business-y in-between? How are we capturing measurements for business Key Performance Indicators (KPIs)? We look to instrument the domain. stuff Domain events are the result of applying a command in the business domain to a specific context. Whether captured or not, these events are happening all the time. What kind of questions may we ask of these insights? What's the average length of time between discount codes being offered and being applied at checkout? What's the correlation between in-app product announcements and newsletter sign-ups? When appointments are canceled, what behavior directly precedes this action? As the questions we ask evolve, so too must our methods of collecting these signals. Further reading: Domain-Oriented Observability Find the Waypoints: When we blew up the monolith into many services, we lost the ability to step through our code with a debugger: it now hops the network.  Our tools are still coming to grips with this seismic shift. — Charity Majors, Observability — a 3-Year Retrospective To reap the benefits of a distributed system, we sacrifice the convenience of having one-stop inspection. It wasn't always this way, and that's one aspect which makes upgrading our observability strategy difficult. Let's take a look at how the observability tooling landscape has evolved. Close to the Metal Logging and monitoring solutions started when we were writing code close to the metal. The open source stack that used to dominate the landscape was a combination of these tools: - full log management system. Graylog - systems, network, and application monitoring and alerting. Nagios - metrics collection and forwarding. StatsD - metrics ingestion for Graphite, stored in the Whisper database. Carbon - metrics querying, visualization, and alerting. Graphite What many observability articles tend to ignore is that this stack is heavily deployed and in-use today. Some of us are still here, and that's okay. still Abstracting the Machine As virtual machines—and eventually cloud infrastructure—gained traction over running on bare metal servers, we saw a shift in how we approach signal-gathering. This gave rise to two prominent stacks in the observability space: ELK stack ElasticSearch - full-text search engine for log storage and querying Logstash - log ingestion Kibana - log visualization and alerting TICK stack Telegraf - metrics ingestion InfluxDB - a time-series database for metric storage and querying Chronograf - metrics visualization Kapacitor - metrics processing and alerting It is common to run these stacks—or some combination thereof—in parallel. Many organizations are still here, and it makes sense. For the most part, they're incredibly robust and mature solutions. However, we are in the midst of yet another sea change. There's one last stop on the map. The Land of Containers There has been a dramatic shift to cloud-native infrastructure. And for systems running in self-managed data centers, containers are beginning to take over as the atomic unit of application deployment. On top of this, Kubernetes has grown to be the dominant container orchestrator ( ). Note: in the Kubernetes world, an atomic unit is known as a Pod and consists of one or more related containers What tools do we use to capture signals in this new world? - metrics collection, querying, and alerting. Prometheus Grafana - Metrics visualization and alerting. EFK stack ElasticSearch Fluentd Kibana or - trace querying and visualization Jaeger Zipkin Why so many options? This world is still maturing, and it has become significantly more complex. The mere existence of OpenTelemetry, discussed more in the next section, gives insight into the fact that the number of options in this space are growing at a fast pace. No matter where businesses are in their journey today, observability of containers and their interactions is likely to become an important initiative. Using OpenTelemetry A tool-agnostic observability framework for communicating telemetry, OpenTelemetry.io defines the project as: OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior. Many Application Performance Monitoring (APM) tools are adding support for OpenTelemetry, as well. Check the OpenTelemetry Registry for more information. Here's an example of adding auto-instrumentation to a Node.js application. /* tracing.js */

// Require dependencies
const opentelemetry = require("@opentelemetry/sdk-node");
const {
  getNodeAutoInstrumentations,
} = require("@opentelemetry/auto-instrumentations-node");

const sdk = new opentelemetry.NodeSDK({
  traceExporter: new opentelemetry.tracing.ConsoleSpanExporter(),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start(); This gives an excellent jumpstart to help us start acquiring signals with low-effort. Distributed Traces offer a superpower for API observability. They allow us to track requests through our distributed systems, and we can even include domain events in our context propagation. A trace has a few components worth noting. Parent (t1)
└── Trace (t2)
    ├── Span (s1)
    │   ├── Event (e1)
    │   └── Event (e2)
    └── Span (s2) Traces can be nested, creating tree-based observability structures. Spans log segments of a trace. Here's an example of a server receiving an HTTP request: // This server is artificial and for example only

import { SemanticAttributes } from "@opentelemetry/semantic-conventions";
import { trace, SpanKind, SpanStatusCode } from "@opentelemetry/api";

async function onGet(request, response) {
  const span = tracer.startSpan(`GET /applicants/:id`, {
    attributes: {
      [SemanticAttributes.HTTP_METHOD]: "GET",
      [SemanticAttributes.HTTP_FLAVOR]: "1.1",
      [SemanticAttributes.HTTP_URL]: request.url,
      [SemanticAttributes.NET_PEER_IP]: "192.0.2.5",
    },
    kind: SpanKind.SERVER,
  });

  const user = await getUser();

  response.send(user.toJson());
  span.setStatus({
    code: SpanStatusCode.OK,
  });
  span.end();
}

server.on("GET", "/applicants/:id", onGet); Span events are a special type of structured logging. They can be associated with a trace, giving insight into what domain events are happening in the broader context of a full interaction. An example of adding events to a span: // Get the current span
const span = tracer.getCurrentSpan();

// Perform the action
applicant.adopt(pet)

// Record the action
span.addEvent( "applicant.adoption.request", {
  "applicant.id", applicant.id,
    "pet.id": pet.id,
    "applicant.eligibilityScore": applicant.eligibilityScore,
}) This is only a high-level overview. Check out the OpenTelemetry docs for more details! Read more: OpenTelemetry Specification Overview W3C Trace Context Propagation format for distributed trace context: Baggage Increase Observability for APIs A significant driver of containerization is a shift in architectural trends to break apart monolithic applications. Containers and microservices have a symbiotic relationship. A co-evolution is occurring in this space. In the , the findings state the following reasons for a rise in the complexity of managing cloud applications: VMWare State of Observability Report 2021 Cross-team adoption of polyglot microservices frameworks Application requests traversing many third-party APIs and technologies Varying approaches in application security across different vendors Legacy telemetry strategies are not enough. How do we start pushing forward an initiative to improve? Evaluate how observability fits into an ongoing API strategy. Talk to stakeholders. Research the impact. Make a case. Follow the guidelines presented in . 5 Developer Tips for Surviving API-First Educate and experiment. There are many links in this post with a few references at the end, as well. Make space for trial and error. Start small, perhaps with a greenfield project. Delegate responsibilities. Team Topologies describes an approach that can help share the load of instrumenting applications and managing observability infrastructure. Read more at: . 🚀 Scale API Teams with Platform Ops Observability creates a window into the organic flow of information that moves through our systems. It allows us to ask the important questions that impact our business. The good news is we almost certainly have familiarity with some of the practices involved. As the landscape continues to grow, it takes a lot of effort to stay ahead of the curve. That's expected. Following an observability initiative is a long-term approach to ensuring survival. Evolution, as we know, requires patience. 🧘 Additional reading: by New Relic A Three-Phased Approach to Observability by Cindy Sridharan Distributed Systems Observability by Kislay Verma Observing is not Debugging (and other misnomers) Splunk's State of Observability 2021 Social photo by on Emiliano Vittoriosi Unsplash

Flow

What is API Observability?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

Tips for Developers to Survive API-First

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

10 Threats to an Open API Ecosystem

10 Indications That You Should Invest in Automation Via APIs

10 Best Practices for Securing Your API

Tips for Developers to Survive API-First

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

10 Threats to an Open API Ecosystem

10 Indications That You Should Invest in Automation Via APIs

10 Best Practices for Securing Your API

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps