1. What are Metrics and Why They're Important Metrics are numerical measurements collected at regular intervals that provide insights into your application's behavior, performance, and health. Unlike logs which capture discrete events, metrics track values that change over time, allowing you to observe patterns, trends, and anomalies. numerical measurements collected at regular intervals 2. Metrics in Prometheus Format and Why Use Prometheus Prometheus has become the de facto standard for metrics collection in cloud-native environments. It uses a simple text-based exposition format that's human-readable and machine-parsable: # HELP http_requests_total Total number of HTTP requests # TYPE http_requests_total counter http_requests_total{method="post",code="200"} 1027 http_requests_total{method="post",code="400"} 3 # HELP http_requests_total Total number of HTTP requests # TYPE http_requests_total counter http_requests_total{method="post",code="200"} 1027 http_requests_total{method="post",code="400"} 3 Each metric has a name, optional labels (key-value pairs in curly braces), and a value. Prometheus offers several compelling advantages: Pull-based architecture: Prometheus pulls metrics from your applications rather than applications pushing metrics to a central server. This is more resilient and easier to debug. Dimensional data model: Using labels allows for multi-dimensional data representation and powerful querying. Powerful query language (PromQL): Enables complex aggregations and computations across metrics. Service discovery: Automatically discovers targets to scrape metrics from. Ecosystem integration: Works seamlessly with Kubernetes, cloud providers, and many other tools. Pull-based architecture: Prometheus pulls metrics from your applications rather than applications pushing metrics to a central server. This is more resilient and easier to debug. Pull-based architecture Dimensional data model: Using labels allows for multi-dimensional data representation and powerful querying. Dimensional data model Powerful query language (PromQL): Enables complex aggregations and computations across metrics. Powerful query language (PromQL) Service discovery: Automatically discovers targets to scrape metrics from. Service discovery Ecosystem integration: Works seamlessly with Kubernetes, cloud providers, and many other tools. Ecosystem integration 3. How to Export Metrics in a Sample Go Application Let's implement metrics in a Go application using the official Prometheus client library. First, install the required packages: go get github.com/prometheus/client_golang/prometheus go get github.com/prometheus/client_golang/prometheus/promauto go get github.com/prometheus/client_golang/prometheus/promhttp go get github.com/prometheus/client_golang/prometheus go get github.com/prometheus/client_golang/prometheus/promauto go get github.com/prometheus/client_golang/prometheus/promhttp Now, let's create a simple HTTP server that demonstrates different metric types: package main import ( "math/rand" "net/http" "time" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto" "github.com/prometheus/client_golang/prometheus/promhttp" ) func main() { // Configure metrics (we'll define these below) configureMetrics() // Set up HTTP server with two endpoints http.HandleFunc("/", handleRequest) http.Handle("/metrics", promhttp.Handler()) // Exposes metrics in Prometheus format // Start server http.ListenAndServe(":8080", nil) } func handleRequest(w http.ResponseWriter, r *http.Request) { // Simulate processing time between 10-100ms processingTime := time.Duration(10+rand.Intn(90)) * time.Millisecond time.Sleep(processingTime) // Update metrics based on this request (we'll implement this below) recordMetrics(r, processingTime) w.Write([]byte("Hello, world!")) } // We'll implement configureMetrics() and recordMetrics() next package main import ( "math/rand" "net/http" "time" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto" "github.com/prometheus/client_golang/prometheus/promhttp" ) func main() { // Configure metrics (we'll define these below) configureMetrics() // Set up HTTP server with two endpoints http.HandleFunc("/", handleRequest) http.Handle("/metrics", promhttp.Handler()) // Exposes metrics in Prometheus format // Start server http.ListenAndServe(":8080", nil) } func handleRequest(w http.ResponseWriter, r *http.Request) { // Simulate processing time between 10-100ms processingTime := time.Duration(10+rand.Intn(90)) * time.Millisecond time.Sleep(processingTime) // Update metrics based on this request (we'll implement this below) recordMetrics(r, processingTime) w.Write([]byte("Hello, world!")) } // We'll implement configureMetrics() and recordMetrics() next Now, let's define and implement each metric type: a. Counter A counter is a cumulative metric that only increases or resets to zero. It's perfect for counting events like requests, errors, or completed tasks. var ( // Counter for total HTTP requests httpRequestsTotal = promauto.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total number of HTTP requests", }, []string{"method", "path"}, // Labels ) ) func configureMetrics() { // Other metric configurations will be added here } func recordMetrics(r *http.Request, duration time.Duration) { // Increment the request counter with appropriate labels httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path).Inc() // Other metric recording will be added here } var ( // Counter for total HTTP requests httpRequestsTotal = promauto.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total number of HTTP requests", }, []string{"method", "path"}, // Labels ) ) func configureMetrics() { // Other metric configurations will be added here } func recordMetrics(r *http.Request, duration time.Duration) { // Increment the request counter with appropriate labels httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path).Inc() // Other metric recording will be added here } b. Gauge A gauge represents a value that can go up and down, like temperature, memory usage, or concurrent connections. var ( // Counter defined earlier... // Gauge for active requests activeRequests = promauto.NewGauge( prometheus.GaugeOpts{ Name: "http_active_requests", Help: "Number of active HTTP requests", }, ) ) func configureMetrics() { // Start a goroutine that randomly adjusts a system metric gauge go func() { for { // Simulate memory usage between 100-200MB memoryUsageMB := 100 + rand.Float64()*100 systemMemoryUsage.Set(memoryUsageMB) time.Sleep(1 * time.Second) } }() } func recordMetrics(r *http.Request, duration time.Duration) { // Counter increment from earlier... // Increase gauge before processing activeRequests.Inc() defer activeRequests.Dec() // Decrease when function returns // Other metric recording will be added here } // Add this to the var block var ( // Previously defined metrics... // Gauge for system memory usage systemMemoryUsage = promauto.NewGauge( prometheus.GaugeOpts{ Name: "system_memory_usage_mb", Help: "Current memory usage in MB", }, ) ) var ( // Counter defined earlier... // Gauge for active requests activeRequests = promauto.NewGauge( prometheus.GaugeOpts{ Name: "http_active_requests", Help: "Number of active HTTP requests", }, ) ) func configureMetrics() { // Start a goroutine that randomly adjusts a system metric gauge go func() { for { // Simulate memory usage between 100-200MB memoryUsageMB := 100 + rand.Float64()*100 systemMemoryUsage.Set(memoryUsageMB) time.Sleep(1 * time.Second) } }() } func recordMetrics(r *http.Request, duration time.Duration) { // Counter increment from earlier... // Increase gauge before processing activeRequests.Inc() defer activeRequests.Dec() // Decrease when function returns // Other metric recording will be added here } // Add this to the var block var ( // Previously defined metrics... // Gauge for system memory usage systemMemoryUsage = promauto.NewGauge( prometheus.GaugeOpts{ Name: "system_memory_usage_mb", Help: "Current memory usage in MB", }, ) ) c. Histogram A histogram samples observations and counts them in configurable buckets, calculating quantiles on the server side. var ( // Previously defined metrics... // Histogram for request duration requestDuration = promauto.NewHistogramVec( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "Duration of HTTP requests in seconds", Buckets: prometheus.LinearBuckets(0.01, 0.01, 10), // 10 buckets, each 10ms wide, starting at 10ms }, []string{"method", "path"}, ) ) func recordMetrics(r *http.Request, duration time.Duration) { // Previous metric recording... // Record request duration requestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration.Seconds()) } var ( // Previously defined metrics... // Histogram for request duration requestDuration = promauto.NewHistogramVec( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "Duration of HTTP requests in seconds", Buckets: prometheus.LinearBuckets(0.01, 0.01, 10), // 10 buckets, each 10ms wide, starting at 10ms }, []string{"method", "path"}, ) ) func recordMetrics(r *http.Request, duration time.Duration) { // Previous metric recording... // Record request duration requestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration.Seconds()) } d. Summary A summary is similar to a histogram but calculates quantiles in the client application rather than the server. var ( // Previously defined metrics... // Summary for request size requestSize = promauto.NewSummaryVec( prometheus.SummaryOpts{ Name: "http_request_size_bytes", Help: "Size of HTTP requests in bytes", Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001}, // 50th, 90th, 99th percentiles }, []string{"method"}, ) ) func recordMetrics(r *http.Request, duration time.Duration) { // Previous metric recording... // Record request size (Content-Length if available) if r.ContentLength > 0 { requestSize.WithLabelValues(r.Method).Observe(float64(r.ContentLength)) } } var ( // Previously defined metrics... // Summary for request size requestSize = promauto.NewSummaryVec( prometheus.SummaryOpts{ Name: "http_request_size_bytes", Help: "Size of HTTP requests in bytes", Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001}, // 50th, 90th, 99th percentiles }, []string{"method"}, ) ) func recordMetrics(r *http.Request, duration time.Duration) { // Previous metric recording... // Record request size (Content-Length if available) if r.ContentLength > 0 { requestSize.WithLabelValues(r.Method).Observe(float64(r.ContentLength)) } } Complete Example Here's the complete example combining all metric types: package main import ( "math/rand" "net/http" "time" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto" "github.com/prometheus/client_golang/prometheus/promhttp" ) var ( // Counter for total HTTP requests httpRequestsTotal = promauto.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total number of HTTP requests", }, []string{"method", "path"}, ) // Gauge for active requests activeRequests = promauto.NewGauge( prometheus.GaugeOpts{ Name: "http_active_requests", Help: "Number of active HTTP requests", }, ) // Gauge for system memory usage systemMemoryUsage = promauto.NewGauge( prometheus.GaugeOpts{ Name: "system_memory_usage_mb", Help: "Current memory usage in MB", }, ) // Histogram for request duration requestDuration = promauto.NewHistogramVec( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "Duration of HTTP requests in seconds", Buckets: prometheus.LinearBuckets(0.01, 0.01, 10), // 10 buckets, each 10ms wide }, []string{"method", "path"}, ) // Summary for request size requestSize = promauto.NewSummaryVec( prometheus.SummaryOpts{ Name: "http_request_size_bytes", Help: "Size of HTTP requests in bytes", Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001}, }, []string{"method"}, ) ) func configureMetrics() { // Simulate changing system metrics go func() { for { // Simulate memory usage between 100-200MB memoryUsageMB := 100 + rand.Float64()*100 systemMemoryUsage.Set(memoryUsageMB) time.Sleep(1 * time.Second) } }() } func recordMetrics(r *http.Request, duration time.Duration) { // Increment the request counter httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path).Inc() // Handle active requests gauge (increases at start, decreased at end via defer) activeRequests.Inc() defer activeRequests.Dec() // Record request duration requestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration.Seconds()) // Record request size if available if r.ContentLength > 0 { requestSize.WithLabelValues(r.Method).Observe(float64(r.ContentLength)) } } func handleRequest(w http.ResponseWriter, r *http.Request) { // Simulate processing time between 10-100ms processingTime := time.Duration(10+rand.Intn(90)) * time.Millisecond time.Sleep(processingTime) // Update metrics recordMetrics(r, processingTime) w.Write([]byte("Hello, world!")) } func main() { // Initialize random seed rand.Seed(time.Now().UnixNano()) // Configure metrics configureMetrics() // Set up HTTP server http.HandleFunc("/", handleRequest) http.Handle("/metrics", promhttp.Handler()) // Start server http.ListenAndServe(":8080", nil) } package main import ( "math/rand" "net/http" "time" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto" "github.com/prometheus/client_golang/prometheus/promhttp" ) var ( // Counter for total HTTP requests httpRequestsTotal = promauto.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total number of HTTP requests", }, []string{"method", "path"}, ) // Gauge for active requests activeRequests = promauto.NewGauge( prometheus.GaugeOpts{ Name: "http_active_requests", Help: "Number of active HTTP requests", }, ) // Gauge for system memory usage systemMemoryUsage = promauto.NewGauge( prometheus.GaugeOpts{ Name: "system_memory_usage_mb", Help: "Current memory usage in MB", }, ) // Histogram for request duration requestDuration = promauto.NewHistogramVec( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "Duration of HTTP requests in seconds", Buckets: prometheus.LinearBuckets(0.01, 0.01, 10), // 10 buckets, each 10ms wide }, []string{"method", "path"}, ) // Summary for request size requestSize = promauto.NewSummaryVec( prometheus.SummaryOpts{ Name: "http_request_size_bytes", Help: "Size of HTTP requests in bytes", Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001}, }, []string{"method"}, ) ) func configureMetrics() { // Simulate changing system metrics go func() { for { // Simulate memory usage between 100-200MB memoryUsageMB := 100 + rand.Float64()*100 systemMemoryUsage.Set(memoryUsageMB) time.Sleep(1 * time.Second) } }() } func recordMetrics(r *http.Request, duration time.Duration) { // Increment the request counter httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path).Inc() // Handle active requests gauge (increases at start, decreased at end via defer) activeRequests.Inc() defer activeRequests.Dec() // Record request duration requestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration.Seconds()) // Record request size if available if r.ContentLength > 0 { requestSize.WithLabelValues(r.Method).Observe(float64(r.ContentLength)) } } func handleRequest(w http.ResponseWriter, r *http.Request) { // Simulate processing time between 10-100ms processingTime := time.Duration(10+rand.Intn(90)) * time.Millisecond time.Sleep(processingTime) // Update metrics recordMetrics(r, processingTime) w.Write([]byte("Hello, world!")) } func main() { // Initialize random seed rand.Seed(time.Now().UnixNano()) // Configure metrics configureMetrics() // Set up HTTP server http.HandleFunc("/", handleRequest) http.Handle("/metrics", promhttp.Handler()) // Start server http.ListenAndServe(":8080", nil) } Once running, you can access the /metrics endpoint to see all exported metrics in Prometheus format. /metrics 4. How to Visualize Metrics in Grafana After implementing metrics in your Go application, the next step is to visualize them using Grafana. Setting Up Prometheus and Grafana The easiest way to get started is using Docker: # Create a Prometheus configuration file cat > prometheus.yml << EOF global: scrape_interval: 15s scrape_configs: - job_name: 'go-app' static_configs: - targets: ['host.docker.internal:8080'] EOF # Start Prometheus docker run -d --name prometheus -p 9090:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus # Start Grafana docker run -d --name grafana -p 3000:3000 grafana/grafana # Create a Prometheus configuration file cat > prometheus.yml << EOF global: scrape_interval: 15s scrape_configs: - job_name: 'go-app' static_configs: - targets: ['host.docker.internal:8080'] EOF # Start Prometheus docker run -d --name prometheus -p 9090:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus # Start Grafana docker run -d --name grafana -p 3000:3000 grafana/grafana Configuring Grafana with Prometheus Data Source Access Grafana at http://localhost:3000 (default credentials: admin/admin) Go to Configuration > Data Sources > Add data source Select Prometheus Set URL to http://host.docker.internal:9090 (or the appropriate Prometheus URL) Click "Save & Test" Query the metrics created in above using this data source and build dashboards and alerts Access Grafana at http://localhost:3000 (default credentials: admin/admin) http://localhost:3000 Go to Configuration > Data Sources > Add data source Select Prometheus Set URL to http://host.docker.internal:9090 (or the appropriate Prometheus URL) http://host.docker.internal:9090 Click "Save & Test" Query the metrics created in above using this data source and build dashboards and alerts 5. Best practices to consider Choose meaningful metric names and labels - Clearly indicate the metric's purpose (e.g., http_requests_total, not just requests) Don't create too many unique label combinations (cardinality explosion) - Be cautious with labels having high cardinality (e.g., user IDs, timestamps) Focus on metrics that provide actionable insights - Don't measure everything, measure what you can act upon or need for troubleshooting Use histograms for service level objectives (SLOs) - Histograms facilitate percentile calculations for latency, critical for SLO monitoring Implement alerting for critical metrics - Define clear thresholds that indicate degraded system states and alert on symptoms (e.g., increased latency or error rate), not just root causes Document your metrics clearly - Ensure each metric includes meaningful HELP descriptions Be consistent in units and naming conventions - Always use base units (seconds, bytes) clearly in metric names (duration_seconds, memory_bytes). Choose meaningful metric names and labels - Clearly indicate the metric's purpose (e.g., http_requests_total, not just requests) Choose meaningful metric names and labels http_requests_total requests Don't create too many unique label combinations (cardinality explosion) - Be cautious with labels having high cardinality (e.g., user IDs, timestamps) Don't create too many unique label combinations (cardinality explosion) Focus on metrics that provide actionable insights - Don't measure everything, measure what you can act upon or need for troubleshooting Focus on metrics that provide actionable insights Use histograms for service level objectives (SLOs) - Histograms facilitate percentile calculations for latency, critical for SLO monitoring Use histograms for service level objectives (SLOs) Implement alerting for critical metrics - Define clear thresholds that indicate degraded system states and alert on symptoms (e.g., increased latency or error rate), not just root causes Implement alerting for critical metrics Document your metrics clearly - Ensure each metric includes meaningful HELP descriptions Document your metrics clearly HELP Be consistent in units and naming conventions - Always use base units (seconds, bytes) clearly in metric names (duration_seconds, memory_bytes). Be consistent in units and naming conventions duration_seconds memory_bytes From here, you can extend your observability setup further. Consider adding more metrics to your application to your metrics for deeper analysis. For production environments, consider scaling Prometheus with long-term storage solutions like Thanos or Mimir. By exporting custom metrics and visualizing them, you’ve equipped your Go application with introspection superpowers. This not only helps with immediate debugging and tuning, but also builds a foundation for reliable, data-driven operations. Happy monitoring! reliable, data-driven operations