Percentiles that hide the problem

At some point, I started noticing that percentiles made performance discussions feel too calm.

p95 looks stable.
p99 moves a bit, but stays within the expected range.
Dashboards look clean, alerts stay quiet, and it feels like the system is under control.

And yet, the system itself already feels different.
Less predictable.
More fragile.
Slower to recover from things that used to be harmless.

This gap is what made me start questioning how much I trust percentiles.

How percentiles took the center of the conversation

For a long time, percentiles felt like the right abstraction.

There is always too much data, and reducing it to a few numbers feels reasonable. You cannot look at every request, so you look at p95 or p99 and assume they describe reality well enough. In stable systems, that assumption often holds.

The shift happens when percentiles quietly stop being a summary and become the main question. Is p95 good or bad. Are we still below the line.

Once that happens, the system itself slowly disappears behind the number. The conversation moves from behavior to thresholds, from understanding to compliance. As long as the line is not crossed, everything feels acceptable.

That is usually the moment when problems start growing unnoticed.

Where problems actually begin

What I kept seeing is that performance problems rarely start everywhere at once.

They begin in a small part of the traffic. A few requests take much longer than usual. Some operations start waiting more than before. Small delays appear, but only for specific paths or conditions.

Most requests are still fast. Percentiles stay calm.

From the outside, nothing looks broken yet. Inside the system, behavior has already started to shift. Things become uneven. Recovery takes longer. Variance increases, even though the main numbers barely move.

This is the stage percentiles are very bad at showing.

They are built to describe the majority. When most requests behave normally, percentiles will protect that picture, even if a small but important part of the system has already changed.

Why the numbers stay clean

Another thing that kept bothering me is how clean percentiles tend to look.

Different request types are mixed together. Short spikes are smoothed over time windows. Rare but heavy slowdowns disappear inside one value. Nothing is technically wrong here, this is exactly how aggregation works.

But psychologically, this cleanliness is dangerous. The smoother the graph looks, the easier it is to believe that the system is calm and under control, even when it is not.

The metric does not lie. It just removes the parts that would make the picture uncomfortable.

Why this pattern felt familiar

At some point, I realized that this was not just about percentiles.

I had seen the same pattern with autoscaling, where the system looks adaptive while quietly multiplying work. I had seen it with CPU utilization, which stays low or stable while progress slows down somewhere else. I had seen it with BFFs, which look fine in isolation while coordination cost builds up inside them.

Percentiles fit into the same category. They describe activity, not pressure. They describe results, not proximity to limits.

They make systems look healthier than they really are, right up until they stop being useful.

Where I am now

These days, I still look at percentiles, but they no longer make me feel safe.

If percentiles look fine while the system feels unstable, I assume the problem is already there, just not visible in the main number yet. Usually it shows up first in uneven behavior, small delays, or parts of the system that start waiting instead of moving forward.

That is why percentiles hide the problem for me. Not because they lie, but because they make it too easy to stop asking harder questions too early.