Let’s take for granted that you’ve done the right thing — you’ve generously instrumented your system, and are actually paying attention to the metrics that you’ve generated (•). The question on the table is — “ ” (Hint: You shouldn’t) Do you actually trust the metrics that you are generating? Let’s look at something fairly straightforward, the request/response path as shown below You would think that the ResponseTime would be the sum of each of the processing stages, right? i.e., ? ResponseTime = 10 + 1 + 20 + 1 + 5 = 37ms But, since you shouldn’t trust your metrics, you Also measured directly, and ResponseTime Compared it against the what it should be, and Charted/alerted on deviationsand you found that the actual ResponseTime was, say, . 52ms That’s quite a difference, no? As to it was 52ms, let’s look at a bunch of possible issues Why : You actually instrumented something completely different. I know, that sounds goofy, but it happens all the time, e.g. you’re measuring the interval instead of ( Measuring the Wrong Thing validate_user validate_users spelling issues with APIs. yay.) : Simply put, you missed something. e.g. There’s a queue in front of the component that you haven’t instrumented, and you’re not measuring the latency there. Incomplete Instrumentation Analyze : Oops. A garbage collection pause. Or a failover. Or a restart. Or whatever. System Issues : Your code has a bunch of paths in it to deal with edge-cases (e.g “ ”), and some of these trigger additional steps that you had forgotten about Unexpected Code Paths strip semi-colons from the input : You just plain screwed up by making one of the — infinitely many — . Time Issues assumptions about time, such as that it increases monotonically, or everything is GMT, or whatever And this is just when it comes to measuring time. The point here being that you should be validating your metrics through multiple means, . In fact, if you are already doing this, and all the numbers line up, you should be very very worried — you’ve probably missing something! for all your metrics So yeah, trust your metrics, you’ve verified them… after (•) You’d be surprised how often I see this missed. “ ” — “D-uh, of course!”“ ” — “Dude, come on, what d’you think I am?”“ ” — “Uhhhhh” Did you instrument your code? Grafana? When was the last time you looked at it? ( This article also appears on my blog )