“Redis dies at 200k RPM, Prometheus can’t scrape 50 servers in time, and the business demands real-time dashboards. Sounds familiar?”
Friday, 6:00 PM. Grafana shows timeouts while scraping metrics. Redis, used by prometheus_client_php
, eats 8GB of RAM and 100% CPU. Prometheus fails to scrape all 50+ servers within the 15-second window. And Black Friday launches on Monday…
This article is about how we switched from a pull to a push model for PHP monitoring in a highload project, why we chose UDP + Telegraf over the classical approach, and how we now collect metrics from 50+ servers without a single timeout.
Architecture: Pull vs Push for PHP Metrics
Why Prometheus PHP Client Doesn’t Always Work for Highload
A typical scenario: you run a PHP Symfony application and need metrics. The first idea — prometheus_client_php
. Great library, but with caveats:
// Classic prometheus_client_php usage
$registry = new CollectorRegistry(new Redis());
$counter = $registry->getOrRegisterCounter('app', 'requests_total', 'Total requests');
$counter->inc(['method' => 'GET', 'endpoint' => '/api/users']);
What happens under the hood:
- Each metric is stored in Redis/APC/in-memory storage
- Prometheus periodically scrapes the /metrics endpoint
- On scrape, all metrics are read from storage
Where problems begin:
- Scaling: With 50+ servers, Prometheus must scrape each. This becomes a bottleneck.
- Storage: Redis adds latency; APC works only per server; in-memory dies on FPM restarts.
- Configuration: You must set up service discovery for all servers.
- Performance: At 200k RPM, each Redis call for counter increment = overhead.
The Solution: Push Model with UDP for PHP Highload Monitoring
Instead, we send metrics via UDP to Telegraf, which then forwards them to Prometheus, InfluxDB, or others.
Why UDP?
- Fire & forget: No waiting for responses, no timeouts.
- Minimal overhead: Microsecond delivery.
- Fault tolerance: If Telegraf crashes, the app keeps running.
- Simplicity: No connection pools, retries, or circuit breakers.
Important: UDP may lose packets, but losing 0.01% metrics won’t distort dashboards.
TelegrafMetricsBundle: Implementation
All of this is packaged in a Symfony bundle — TelegrafMetricsBundle — for sending metrics over UDP.
Installation
composer require yakovlef/telegraf-metrics-bundle
Config (config/packages/telegraf_metrics.yaml):
telegraf_metrics:
namespace: 'my_app'
client:
url: 'http://localhost:8086'
udpPort: 8089
Bundle Architecture
Three core components:
// MetricsCollectorInterface - DI contract
interface MetricsCollectorInterface
{
public function collect(string $name, array $fields, array $tags = []): void;
}
// Implementation with UDP Writer - implemented via InfluxDB UDP Writer
class MetricsCollector implements MetricsCollectorInterface
{
private UdpWriter $writer;
public function __construct(Client $client, string $namespace)
{
$this->writer = $client->createUdpWriter();
}
public function collect(string $name, array $fields, array $tags = []): void
{
// Send metric in InfluxDB
$this->writer->write(
new Point("{$this->namespace}_$name", $tags, $fields)
);
}
}
DI integration:
services:
Yakovlef\TelegrafMetricsBundle\Collector\MetricsCollectorInterface:
'@telegraf_metrics.collector'
Practical Use Cases
1. API Endpoint Monitoring
class ApiController
{
public function __construct(
private MetricsCollectorInterface $metrics
) {}
public function getUsers(): JsonResponse
{
$startTime = microtime(true);
try {
$users = $this->userRepository->findAll();
$responseTime = (microtime(true) - $startTime) * 1000;
$this->metrics->collect('api_request', [
'response_time' => $responseTime,
'count' => 1
], [
'endpoint' => '/api/users',
'method' => 'GET',
'status' => '200'
]);
return new JsonResponse($users);
} catch (\Exception $e) {
$this->metrics->collect('api_error', ['count' => 1], [
'endpoint' => '/api/users',
'error_type' => get_class($e),
'status' => '500'
]);
throw $e;
}
}
}
2. Business Metrics in E-commerce
class OrderService
{
public function createOrder(OrderDto $dto): Order
{
$order = new Order($dto);
$this->em->persist($order);
$this->em->flush();
$this->metrics->collect('order_created', [
'amount' => $order->getTotalAmount(),
'items_count' => $order->getItemsCount(),
'count' => 1
], [
'payment_method' => $order->getPaymentMethod(),
'currency' => $order->getCurrency(),
'user_type' => $order->getUser()->getType()
]);
return $order;
}
public function processPayment(Order $order): void
{
$startTime = microtime(true);
try {
$result = $this->paymentGateway->charge($order);
$this->metrics->collect('payment_processed', [
'amount' => $order->getTotalAmount(),
'processing_time' => (microtime(true) - $startTime) * 1000,
'count' => 1
], [
'gateway' => $this->paymentGateway->getName(),
'status' => 'success'
]);
} catch (PaymentException $e) {
$this->metrics->collect('payment_failed', [
'amount' => $order->getTotalAmount(),
'count' => 1
], [
'gateway' => $this->paymentGateway->getName(),
'error_code' => $e->getCode()
]);
throw $e;
}
}
}
3. Background Job Monitoring
class EmailConsumer implements MessageHandlerInterface
{
public function __invoke(SendEmailMessage $message): void
{
$startTime = microtime(true);
try {
$this->mailer->send($message->getEmail());
$this->metrics->collect('consumer_processed', [
'processing_time' => (microtime(true) - $startTime) * 1000,
'count' => 1
], [
'consumer' => 'email',
'status' => 'success',
'priority' => $message->getPriority()
]);
} catch (\Exception $e) {
$this->metrics->collect('consumer_failed', ['count' => 1], [
'consumer' => 'email',
'error' => get_class($e)
]);
throw $e;
}
}
}
4. Circuit Breaker Pattern
class ExternalApiClient
{
private int $failures = 0;
private bool $isOpen = false;
public function call(string $endpoint): array
{
if ($this->isOpen) {
$this->metrics->collect('circuit_breaker', ['count' => 1], [
'service' => 'external_api',
'state' => 'open',
'action' => 'rejected'
]);
throw new CircuitBreakerOpenException();
}
try {
$response = $this->httpClient->request('GET', $endpoint);
$this->failures = 0;
$this->metrics->collect('circuit_breaker', ['count' => 1], [
'service' => 'external_api',
'state' => 'closed',
'action' => 'success'
]);
return $response->toArray();
} catch (\Exception $e) {
$this->failures++;
if ($this->failures >= 5) {
$this->isOpen = true;
$this->metrics->collect('circuit_breaker', ['count' => 1], [
'service' => 'external_api',
'state' => 'open',
'action' => 'opened'
]);
}
throw $e;
}
}
}
Aggregation in Telegraf
Telegraf’s killer feature — built-in aggregations (basicstats). Instead of raw data flooding Prometheus, aggregation happens directly in Telegraf.
Metric | Description | Use case |
count | Number of values per period | Requests, errors, registrations |
sum | Sum of values | Total revenue, processing time |
mean | Arithmetic mean | Avg response time, avg basket size |
min | Minimum | Min response time, smallest order |
max | Maximum | Peak load, max response time |
stdev | Standart deviation | Response time variability |
s2 | Variance | More sensitive variability metric |
Example telegraf.conf
[[inputs.socket_listener]]
service_address = "udp://:8089"
data_format = "influx"
[[aggregators.basicstats]]
period = "10s"
drop_original = false
stats = ["count", "mean", "sum", "min", "max", "stdev"]
namepass = ["my_app_api_*"]
[[outputs.prometheus_client]]
listen = ":9273"
metric_version = 2
path = "/metrics"
metric_batch_size = 1000
metric_buffer_limit = 10000
Pitfalls and How to Avoid Them
UDP Packet Loss — and Why It’s Fine
Problem: At high load, packet loss may occur.
Solution: Monitor Telegraf’s own metrics. If losses are critical — increase UDP buffers or add batching in the application.
Remember: losing 0.01% metrics is better than app crash due to Redis.
UDP Packet Size: Why Your Metrics Might Not Arrive
Problem: UDP packet size limit is ~65KB. With too many tags, you can exceed it.
Solution: Limit unique tags and use short names:
// Bad: long tags with high cardinality
$this->metrics->collect('api_request', ['time' => 100], [
'user_email' => $user->getEmail(), // high cardinality
'request_id' => uniqid(), // unique every time
'full_endpoint_path_with_parameters' => $request->getUri()
]);
// Good: short tags with low cardinality
$this->metrics->collect('api_request', ['time' => 100], [
'endpoint' => '/api/users',
'method' => 'GET',
'status' => '200'
]);
Fewer unique tags = smaller packet size = more reliable delivery.
Alternative Scenarios
VictoriaMetrics Instead of Prometheus
For high-load systems, Prometheus can become a bottleneck: high memory consumption, long queries with large data volumes, and no clustering mode “out of the box.”
VictoriaMetrics is fully compatible with the Prometheus protocol but:
- is more efficient in storage,
- handles long queries faster,
- supports horizontal scaling.
That makes it a more reliable choice for systems with hundreds of thousands of metrics per second.
Sending Metrics to Multiple Systems Simultaneously
[[outputs.prometheus_client]]
listen = ":9273"
[[outputs.influxdb_v2]]
urls = ["http://influxdb:8086"]
[[outputs.graphite]]
servers = ["graphite:2003"]
Roadmap and Current Limitations
Already works:
- Production-ready
- Symfony 6.4+ and 7.0+
- Prometheus / VictoriaMetrics supported
- Zero-overhead delivery
Note: no test suite yet, but it’s been running stable in multiple highload projects for over a year.
Final Thoughts
Switching to the push model with UDP + Telegraf gave us three key wins:
Performance as a competitive advantage
Latency reduced 60× (from 3ms to 0.05ms). At 200k RPM, that saves 10 minutes of CPU time per hour, allowing 15% more requests on the same hardware.
Scaling without headaches
Linear scaling — adding new servers now takes 30 seconds. Just deploy with the same UDP endpoint. No Prometheus changes, no service discovery.
System antifragility
Complete isolation of failures — the metrics system can collapse entirely, and the app continues running. Over the years, this saved us multiple times during monitoring infrastructure outages.
Metrics in PHP are not a luxury but a necessity to understand what’s happening in production. The Telegraf UDP approach allowed us to forget about scaling problems and focus on what really matters — business logic and user experience.
Yes, we sacrificed guaranteed delivery of every packet. But in return, we got a system that withstands any load and never becomes a single point of failure — especially at critical peak moments.
Bundle available on GitHub and Packagist.
P.S. If this saved you time reinventing the wheel — star the repo. Found a bug? Open an issue, and we’ll fix it.