Telemetry, in simple terms, is like having a conversation with machines or systems located far away. It's about collecting data from these distant sources to better understand how they're performing. Think of it as a health check-up, but for machines or systems instead of people.
This technology plays a crucial role in a variety of fields. In agriculture, it helps monitor crop conditions; in healthcare, it keeps tabs on patient metrics like blood pressure and blood sugar levels; and in weather forecasting, it's indispensable for predicting mother nature's next move.
So, it's a bit like having a network of digital spies scattered in different areas, each sending back valuable information that lets us peek into the functioning of systems we can't be physically present at.
In the world of technology and complex software, telemetry is crucial for software optimization. It involves collecting data from various software deployments to understand performance in different environments. This helps in quickly identifying and fixing issues, leading to better, more adaptable software. For example, analyzing logs from different deployments can reveal specific areas for improvement or just tracing data for user engagement.
Although they are terms that are used interchangeably, there are slight hierarchial differences between them.
So, we can say that monitoring is a subset of telemetry. It provides deeper monitoring capabilities and system metrics.
Imagine having a enterprise level network. It becomes crucial for the admin to constantly monitor the data flow and network traffic that is going through the endpoints. This is where network telemetry comes in.
Network Telemetry is a subset of telemetry. It involves gathering of data from sources like routers, switches, servers and applications and how data moves through them.
One method to achieve this is through capture network traffic and analyse them. These tools provide information about network bandwidth, latency, packet loss, and other performance metrics.
Capturing network traffic is a fundamental aspect of network telemetry. It involves monitoring and recording the data packets that travel across a network. This data can include packet headers, payload, source and destination IP addresses, port numbers, protocol types, and more. Analyzing this captured data helps in identifying traffic patterns, detecting anomalies, diagnosing network issues, and ensuring network security.
Before eBPF, we had tcpdump, which tracked and recorded HTTP packets. The problem was that it did it in multiple sessions which we had to arrange, creating a tedious task. This is where precisely eBPF came in. Released in 2014, eBPF was a paradigm shift in network telemetry. It allowed executing code directly into the Linux Kernel without actually altering the code in it. More specifically, it allowed packet filtering and capturing traffic, through which automatically removed the irrelevant data,enabling you to process and filter your application traffic with only a limited effect on performance, even when throughput is high.
eBPF lets you add a hook to kernel system calls(syscalls) and helper functions, allowing you to provide visibility into traffic payloads and function results (success/failure).
As a result, with eBPF, you can enable complex functionality and processing of network traffic, including layer-7 filtering, independently of the application sending data to the kernel.
Numerous tools exist that facilitate the interaction with and creation of interfaces for developers to utilize eBPF code. Additionally, various programming languages can be employed to write or interface with eBPF code. Let's look at some tools like that:
Enough talk! Let's code to make sense of everything above!
For this demonstration, we will utilize an example sourced from the official BCC GitHub repository.
This script is a Python program using BCC (BPF Compiler Collection) for tracing TCP IPv4 connection attempts in Linux using eBPF (Extended Berkeley Packet Filter) with embedded C code. It's designed to capture all IPv4 connection attempts, successful or not.
Install Build dependencies:
For Bionic(18.04 LTS):
sudo apt-get -y install zip bison build-essential cmake flex git libedit-dev \ libllvm6.0 llvm-6.0-dev libclang-6.0-dev python zlib1g-dev libelf-dev libfl-dev python3-setuptools \ liblzma-dev arping netperf iperf
For Focal(20.04.1 LTS):
sudo apt install -y zip bison build-essential cmake flex git libedit-dev \ libllvm12 llvm-12-dev libclang-12-dev python zlib1g-dev libelf-dev libfl-dev python3-setuptools \ liblzma-dev arping netperf iperf
For Hirsute(21.04) or Impish (21.10):
sudo apt install -y zip bison build-essential cmake flex git libedit-dev \ libllvm11 llvm-11-dev libclang-11-dev python3 zlib1g-dev libelf-dev libfl-dev python3-setuptools \ liblzma-dev arping netperf iperf
For Jammy(22.04):
sudo apt install -y zip bison build-essential cmake flex git libedit-dev \ libllvm14 llvm-14-dev libclang-14-dev python3 zlib1g-dev libelf-dev libfl-dev python3-setuptools \ liblzma-dev libdebuginfod-dev arping netperf iperf
For Lunar Lobster(23.04):
sudo apt install -y zip bison build-essential cmake flex git libedit-dev \ libllvm15 llvm-15-dev libclang-15-dev python3 zlib1g-dev libelf-dev libfl-dev python3-setuptools \ liblzma-dev libdebuginfod-dev arping netperf iperf libpolly-15-dev
For other versions:
sudo apt-get -y install zip bison build-essential cmake flex git libedit-dev \ libllvm3.7 llvm-3.7-dev libclang-3.7-dev python zlib1g-dev libelf-dev python3-setuptools \ liblzma-dev arping netperf iperf
For Lua support:
sudo apt-get -y install luajit luajit-5.1-dev
Install and compile BCC:
git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake ..
make
sudo make install
cmake -DPYTHON_CMD=python3 .. # build python3 binding
pushd src/python/
make
sudo make install
popd
Code:
#!/usr/bin/python
from __future__ import print_function
from bcc import BPF
from bcc.utils import printb
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <net/sock.h>
#include <bcc/proto.h>
BPF_HASH(currsock, u32, struct sock *);
int kprobe__tcp_v4_connect(struct pt_regs *ctx, struct sock *sk)
{
u32 pid = bpf_get_current_pid_tgid();
// stash the sock ptr for lookup on return
currsock.update(&pid, &sk);
return 0;
};
int kretprobe__tcp_v4_connect(struct pt_regs *ctx)
{
int ret = PT_REGS_RC(ctx);
u32 pid = bpf_get_current_pid_tgid();
struct sock **skpp;
skpp = currsock.lookup(&pid);
if (skpp == 0) {
return 0; // missed entry
}
if (ret != 0) {
// failed to send SYNC packet, may not have populated
// socket __sk_common.{skc_rcv_saddr, ...}
currsock.delete(&pid);
return 0;
}
// pull in details
struct sock *skp = *skpp;
u32 saddr = skp->__sk_common.skc_rcv_saddr;
u32 daddr = skp->__sk_common.skc_daddr;
u16 dport = skp->__sk_common.skc_dport;
// output
bpf_trace_printk("trace_tcp4connect %x %x %d\\n", saddr, daddr, ntohs(dport));
currsock.delete(&pid);
return 0;
}
"""
# initialize BPF
b = BPF(text=bpf_text)
# header
print("%-6s %-12s %-16s %-16s %-4s" % ("PID", "COMM", "SADDR", "DADDR",
"DPORT"))
def inet_ntoa(addr):
dq = b''
for i in range(0, 4):
dq = dq + str(addr & 0xff).encode()
if (i != 3):
dq = dq + b'.'
addr = addr >> 8
return dq
# filter and format output
while 1:
# Read messages from kernel pipe
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
(_tag, saddr_hs, daddr_hs, dport_s) = msg.split(b" ")
except ValueError:
# Ignore messages from other tracers
continue
except KeyboardInterrupt:
exit()
# Ignore messages from other tracers
if _tag.decode() != "trace_tcp4connect":
continue
printb(b"%-6d %-12.12s %-16s %-16s %-4s" % (pid, task,
inet_ntoa(int(saddr_hs, 16)),
inet_ntoa(int(daddr_hs, 16)),
dport_s))
Run the program:
sudo python tcpv4connect.py
Code Breakdown:
In the rest of Python Script, The script enters an infinite loop where it reads output messages from the BPF program (via b.trace_fields()
).
443
, which is commonly used for HTTPS traffic, indicating secure web connections.Network telemetry functions as a network sleuth, collecting information and insights regarding the network's operation and efficiency. This tool equips network managers with the necessary resources to ensure their network infrastructure remains strong and effective.
Thus, for both seasoned network experts and those new to networking, network telemetry is an essential tool for overseeing and enhancing your network.
Telemetry involves collecting data from remote sources to monitor network performance, aiding in proactive issue detection and optimization.
Network telemetry offers deeper insights than traditional monitoring by capturing detailed data on traffic patterns, latency, and packet loss in real-time.
eBPF enables efficient network telemetry by allowing code execution within the Linux Kernel, facilitating advanced packet filtering and traffic analysis.
Challenges in network telemetry implementation include ensuring compatibility with diverse network environments and managing the overhead of data collection without impacting performance.