paint-brush
Setup Monitoring Using Apache Zookeeper and OpenTelemetryby@deeparamachandra
405 reads
405 reads

Setup Monitoring Using Apache Zookeeper and OpenTelemetry

by Deepa RamachandraOctober 11th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Monitoring Zookeeper applications helps to ensure that the data sets are distributed as expected across the cluster. The configuration detailed in this post uses observIQ’s distribution of the OpenTelemetry collector. The following metrics categories are monitored using this configuration: Znodes, Latency and Throughput. The metrics are exported to New Relic using the exporter using the OTLPorter. This helps with filtering metrics from specific Redis hosts in the monitoring tool, in this case, Google Cloud operations.

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Setup Monitoring Using Apache Zookeeper and OpenTelemetry
Deepa Ramachandra HackerNoon profile picture


In this article, I’ll show you a simplified way to configure a critical open-source component, Zookeeper. Monitoring Zookeeper applications helps to ensure that the data sets are distributed as expected across the cluster. Although Zookeeper is considered very resilient to network mishaps, it’s inevitable that you will want to monitor the server. To do this, I’ll be using the Zookeeper receiver from OpenTelemetry.


The configuration detailed in this post uses observIQ’s distribution of the OpenTelemetry collector, which simplifies the use of OpenTelemetry for all users. You can take a look at the details of this support in the repo.


You can utilize this receiver in conjunction with any OTel collector: including the OpenTelemetry Collector and observIQ’s distribution of the collector.


Monitoring performance metrics for Zookeeper is necessary to ensure that all the jobs are running as expected and the clusters are humming. The following metrics categories are monitored using this configuration:


Znodes

You can automatically discover Zookeeper Clusters, monitor memory (heap and non-heap) on the Znode, and get alerts of changes in resource consumption. You can also automatically collect, graph, and get alerts on garbage collection iterations, heap size and usage, and threads. ZooKeeper hosts are deployed in a cluster, and as long as a majority of hosts are up, the service will be available. Note that you must ensure the total node count inside the ZooKeeper tree is consistent.


Latency and Throughput

This metric will provide a consistent view of the performance of your servers, regardless of whether they change roles from Followers to Leader or back you will get a meaningful view of the history.


Configuring the Zookeeper Receiver

After the installation, the config file for the collector can be found at:

  • C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows)

  • /opt/observiq-otel-collector/config.yaml (Linux)


Receiver Configuration:

  1. Configure the collection_interval attribute. It is set to 60 seconds in this sample configuration.
  2. Set up the endpoint attribute as the system that is running the Hadoop instance
receivers:
  zookeeper:
    collection_interval: 30s
    endpoint: localhost:2181


Processor Configuration:

  1. The resource detection processor is used to create a distinction between metrics received from multiple Hadoop systems. This helps with filtering metrics from specific Redis hosts in the monitoring tool, in this case, Google Cloud operations.

  2. Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the logging component of the collector. To learn more about this processor check the documentation.


processors:
  resourcedetection:
    detectors: ["system"]
    system:
      hostname_sources: ["os"]

  batch:


Exporter Configuration:

In this example, the metrics are exported to New Relic using the OTLP exporter. If you would like to forward your metrics to a different destination, check the destinations that OpenTelemetry supports at this time, here.


exporters:
  otlp:
    endpoint: https://otlp.nr-data.net:443
    headers:
      api-key: 00000-00000-00000
    tls:
      insecure: false


Set up the pipeline:

service:
  pipelines:
    metrics:
      receivers:
      - zookeeper
      processors:
      - resourcedetection
      - batch
      exporters:
      - otlp

Viewing the Metrics

All the metrics the Zookeeper receiver scrapes are listed below.

Metric

Description

zookeeper.connection.active

The number of active connections.

zookeeper.data_tree…hemeral_node.count

The number of ephemeral nodes.

zookeeper.data_tree.size

The size of the data tree.

zookeeper.file_descriptor.limit

The limit set for the file descriptor.

zookeeper.file_descriptor.open

The number of open file descriptors

zookeeper.latency.max

The maximum latency

zookeeper.latency.min

The minimum latency set.

zookeeper.packet.count

The packet count

zookeeper.request.active

The number of active requests

zookeeper.watch.count

The watch count

zookeeper.znode.count

The total number of znode.

Alerting

Now that you have the metrics gathered and exported to the destination of your choice, you may want to explore how to configure alerts for these metrics effectively. Here are some alerting possibilities for ZooKeeper:

Alert

Severity

ZooKeeper server is down

critical

Too many znodes created

warning

Too many connections created

warning

Memory occupied by znode is too large

warning

Set too many watch

warning

Too many files open

warning

Average latency is too high

warning

JVM memory almost full

warning


As you can see, this is a simple way to implement the OpenTelemetry standards. Additionally, if you use the observIQ distribution, this provides a single-line installer and integrated receivers, exporter, and processor pool, making working with this collector an easy task.


Also Published here