Scaling PHP Symfony Metrics at 200k RPM: 50+ Servers, Zero Overhead with UDP + Telegraf

Written by yakovlef | Published 2025/09/02
Tech Story Tags: php | symfony | metrics | telegraf | prometheus | grafana | web-development

TLDRAt scale, PHP metrics collection with Redis + Prometheus quickly breaks down—timeouts, memory spikes, and unscalable scrapes. This article shows how switching from a pull to a push model with UDP and Telegraf transformed monitoring in a high-traffic Symfony application. The approach eliminated Redis overhead, enabled linear scaling across 50+ servers, cut latency by 60×, and introduced antifragility into the system. Alongside implementation details, real business use cases, pitfalls, and alternatives like VictoriaMetrics are covered.via the TL;DR App

“Redis dies at 200k RPM, Prometheus can’t scrape 50 servers in time, and the business demands real-time dashboards. Sounds familiar?”


Friday, 6:00 PM. Grafana shows timeouts while scraping metrics. Redis, used by prometheus_client_php, eats 8GB of RAM and 100% CPU. Prometheus fails to scrape all 50+ servers within the 15-second window. And Black Friday launches on Monday…


This article is about how we switched from a pull to a push model for PHP monitoring in a highload project, why we chose UDP + Telegraf over the classical approach, and how we now collect metrics from 50+ servers without a single timeout.


Architecture: Pull vs Push for PHP Metrics



Why Prometheus PHP Client Doesn’t Always Work for Highload


A typical scenario: you run a PHP Symfony application and need metrics. The first idea — prometheus_client_php. Great library, but with caveats:

// Classic prometheus_client_php usage
$registry = new CollectorRegistry(new Redis());
$counter = $registry->getOrRegisterCounter('app', 'requests_total', 'Total requests');
$counter->inc(['method' => 'GET', 'endpoint' => '/api/users']);


What happens under the hood:


  1. Each metric is stored in Redis/APC/in-memory storage
  2. Prometheus periodically scrapes the /metrics endpoint
  3. On scrape, all metrics are read from storage


Where problems begin:


  • Scaling: With 50+ servers, Prometheus must scrape each. This becomes a bottleneck.
  • Storage: Redis adds latency; APC works only per server; in-memory dies on FPM restarts.
  • Configuration: You must set up service discovery for all servers.
  • Performance: At 200k RPM, each Redis call for counter increment = overhead.


The Solution: Push Model with UDP for PHP Highload Monitoring


Instead, we send metrics via UDP to Telegraf, which then forwards them to Prometheus, InfluxDB, or others.


Why UDP?


  1. Fire & forget: No waiting for responses, no timeouts.
  2. Minimal overhead: Microsecond delivery.
  3. Fault tolerance: If Telegraf crashes, the app keeps running.
  4. Simplicity: No connection pools, retries, or circuit breakers.


Important: UDP may lose packets, but losing 0.01% metrics won’t distort dashboards.


TelegrafMetricsBundle: Implementation


All of this is packaged in a Symfony bundle — TelegrafMetricsBundle — for sending metrics over UDP.


Installation

composer require yakovlef/telegraf-metrics-bundle


Config (config/packages/telegraf_metrics.yaml):

telegraf_metrics:
    namespace: 'my_app'
    client:
        url: 'http://localhost:8086'
        udpPort: 8089


Bundle Architecture

Three core components:

// MetricsCollectorInterface - DI contract
interface MetricsCollectorInterface
{
    public function collect(string $name, array $fields, array $tags = []): void;
}

// Implementation with UDP Writer - implemented via InfluxDB UDP Writer
class MetricsCollector implements MetricsCollectorInterface
{
    private UdpWriter $writer;

    public function __construct(Client $client, string $namespace)
    {
        $this->writer = $client->createUdpWriter();
    }

    public function collect(string $name, array $fields, array $tags = []): void
    {
        // Send metric in InfluxDB
        $this->writer->write(
            new Point("{$this->namespace}_$name", $tags, $fields)
        );
    }
}


DI integration:

services:
    Yakovlef\TelegrafMetricsBundle\Collector\MetricsCollectorInterface: 
        '@telegraf_metrics.collector'


Practical Use Cases


1. API Endpoint Monitoring

class ApiController
{
    public function __construct(
        private MetricsCollectorInterface $metrics
    ) {}

    public function getUsers(): JsonResponse
    {
        $startTime = microtime(true);
        
        try {
            $users = $this->userRepository->findAll();
            $responseTime = (microtime(true) - $startTime) * 1000;
            
            $this->metrics->collect('api_request', [
                'response_time' => $responseTime,
                'count' => 1
            ], [
                'endpoint' => '/api/users',
                'method' => 'GET',
                'status' => '200'
            ]);
            
            return new JsonResponse($users);
            
        } catch (\Exception $e) {
            $this->metrics->collect('api_error', ['count' => 1], [
                'endpoint' => '/api/users',
                'error_type' => get_class($e),
                'status' => '500'
            ]);
            throw $e;
        }
    }
}


2. Business Metrics in E-commerce

class OrderService
{
    public function createOrder(OrderDto $dto): Order
    {
        $order = new Order($dto);
        $this->em->persist($order);
        $this->em->flush();
        
        $this->metrics->collect('order_created', [
            'amount' => $order->getTotalAmount(),
            'items_count' => $order->getItemsCount(),
            'count' => 1
        ], [
            'payment_method' => $order->getPaymentMethod(),
            'currency' => $order->getCurrency(),
            'user_type' => $order->getUser()->getType()
        ]);
        
        return $order;
    }
    
    public function processPayment(Order $order): void
    {
        $startTime = microtime(true);
        
        try {
            $result = $this->paymentGateway->charge($order);
            
            $this->metrics->collect('payment_processed', [
                'amount' => $order->getTotalAmount(),
                'processing_time' => (microtime(true) - $startTime) * 1000,
                'count' => 1
            ], [
                'gateway' => $this->paymentGateway->getName(),
                'status' => 'success'
            ]);
            
        } catch (PaymentException $e) {
            $this->metrics->collect('payment_failed', [
                'amount' => $order->getTotalAmount(),
                'count' => 1
            ], [
                'gateway' => $this->paymentGateway->getName(),
                'error_code' => $e->getCode()
            ]);
            throw $e;
        }
    }
}


3. Background Job Monitoring

class EmailConsumer implements MessageHandlerInterface
{
    public function __invoke(SendEmailMessage $message): void
    {
        $startTime = microtime(true);
        
        try {
            $this->mailer->send($message->getEmail());
            
            $this->metrics->collect('consumer_processed', [
                'processing_time' => (microtime(true) - $startTime) * 1000,
                'count' => 1
            ], [
                'consumer' => 'email',
                'status' => 'success',
                'priority' => $message->getPriority()
            ]);
            
        } catch (\Exception $e) {
            $this->metrics->collect('consumer_failed', ['count' => 1], [
                'consumer' => 'email',
                'error' => get_class($e)
            ]);
            throw $e;
        }
    }
}


4. Circuit Breaker Pattern

class ExternalApiClient
{
    private int $failures = 0;
    private bool $isOpen = false;
    
    public function call(string $endpoint): array
    {
        if ($this->isOpen) {
            $this->metrics->collect('circuit_breaker', ['count' => 1], [
                'service' => 'external_api',
                'state' => 'open',
                'action' => 'rejected'
            ]);
            throw new CircuitBreakerOpenException();
        }
        
        try {
            $response = $this->httpClient->request('GET', $endpoint);
            
            $this->failures = 0;
            $this->metrics->collect('circuit_breaker', ['count' => 1], [
                'service' => 'external_api',
                'state' => 'closed',
                'action' => 'success'
            ]);
            
            return $response->toArray();
            
        } catch (\Exception $e) {
            $this->failures++;
            
            if ($this->failures >= 5) {
                $this->isOpen = true;
                $this->metrics->collect('circuit_breaker', ['count' => 1], [
                    'service' => 'external_api',
                    'state' => 'open',
                    'action' => 'opened'
                ]);
            }
            
            throw $e;
        }
    }
}


Aggregation in Telegraf


Telegraf’s killer feature — built-in aggregations (basicstats). Instead of raw data flooding Prometheus, aggregation happens directly in Telegraf.


Metric Description
Use case
count
Number of values per periodRequests, errors, registrations
sumSum of valuesTotal revenue, processing time
meanArithmetic meanAvg response time, avg basket size
minMinimumMin response time, smallest order
maxMaximumPeak load, max response time
stdevStandart deviationResponse time variability
s2VarianceMore sensitive variability metric


Example telegraf.conf

[[inputs.socket_listener]]
  service_address = "udp://:8089"
  data_format = "influx"

[[aggregators.basicstats]]
  period = "10s"
  drop_original = false
  stats = ["count", "mean", "sum", "min", "max", "stdev"]
  namepass = ["my_app_api_*"]

[[outputs.prometheus_client]]
  listen = ":9273"
  metric_version = 2
  path = "/metrics"
  metric_batch_size = 1000
  metric_buffer_limit = 10000


Pitfalls and How to Avoid Them


UDP Packet Loss — and Why It’s Fine

Problem: At high load, packet loss may occur.

Solution: Monitor Telegraf’s own metrics. If losses are critical — increase UDP buffers or add batching in the application.

Remember: losing 0.01% metrics is better than app crash due to Redis.


UDP Packet Size: Why Your Metrics Might Not Arrive

Problem: UDP packet size limit is ~65KB. With too many tags, you can exceed it.

Solution: Limit unique tags and use short names:

// Bad: long tags with high cardinality
$this->metrics->collect('api_request', ['time' => 100], [
    'user_email' => $user->getEmail(), // high cardinality
    'request_id' => uniqid(),          // unique every time
    'full_endpoint_path_with_parameters' => $request->getUri()
]);

// Good: short tags with low cardinality
$this->metrics->collect('api_request', ['time' => 100], [
    'endpoint' => '/api/users',
    'method' => 'GET',
    'status' => '200'
]);
Fewer unique tags = smaller packet size = more reliable delivery.


Alternative Scenarios


VictoriaMetrics Instead of Prometheus

For high-load systems, Prometheus can become a bottleneck: high memory consumption, long queries with large data volumes, and no clustering mode “out of the box.”


VictoriaMetrics is fully compatible with the Prometheus protocol but:

  • is more efficient in storage,
  • handles long queries faster,
  • supports horizontal scaling.


That makes it a more reliable choice for systems with hundreds of thousands of metrics per second.


Sending Metrics to Multiple Systems Simultaneously

[[outputs.prometheus_client]]
  listen = ":9273"

[[outputs.influxdb_v2]]
  urls = ["http://influxdb:8086"]

[[outputs.graphite]]
  servers = ["graphite:2003"]


Roadmap and Current Limitations


Already works:


  • Production-ready
  • Symfony 6.4+ and 7.0+
  • Prometheus / VictoriaMetrics supported
  • Zero-overhead delivery


Note: no test suite yet, but it’s been running stable in multiple highload projects for over a year.


Final Thoughts


Switching to the push model with UDP + Telegraf gave us three key wins:


Performance as a competitive advantage

Latency reduced 60× (from 3ms to 0.05ms). At 200k RPM, that saves 10 minutes of CPU time per hour, allowing 15% more requests on the same hardware.


Scaling without headaches

Linear scaling — adding new servers now takes 30 seconds. Just deploy with the same UDP endpoint. No Prometheus changes, no service discovery.


System antifragility

Complete isolation of failures — the metrics system can collapse entirely, and the app continues running. Over the years, this saved us multiple times during monitoring infrastructure outages.


Metrics in PHP are not a luxury but a necessity to understand what’s happening in production. The Telegraf UDP approach allowed us to forget about scaling problems and focus on what really matters — business logic and user experience.


Yes, we sacrificed guaranteed delivery of every packet. But in return, we got a system that withstands any load and never becomes a single point of failure — especially at critical peak moments.


Bundle available on GitHub and Packagist.


P.S. If this saved you time reinventing the wheel — star the repo. Found a bug? Open an issue, and we’ll fix it.


Written by yakovlef | Team Lead | Software Engineer
Published by HackerNoon on 2025/09/02