What is Performance Testing

29.09.2025

A simple take on performance testing: why we need it, how it helps spot bottlenecks, figure out capacity, and make sure prod can handle the load.

theory

🐏

Introduction

If you ask a developer or tester, “What’s performance testing?”, chances are they’ll say something like: “We throw a bunch of users at the system and see what breaks.”

There’s a bit of truth in that, but it’s really just the tip of the iceberg. Performance testing isn’t about smashing the system for fun, it’s about digging deeper to understand:

where the bottlenecks are,
how the system behaves under load,
and at what point the system hits its limits.

Because let’s be real: charts full of numbers don’t mean much on their own. They don’t tell you why response time is climbing, what that means for the business, or how to fix it.
At its core, performance testing is about one thing: having confidence that production can handle the real world.

Contents

Why This Matters
Response Time and System Capacity
The Sheep Story
Test Types
Colleagues’ Opinions
Bottleneck
Case Studies
Theory Over Tools
Recommendations

Why This Matters

So, why do we even bother with performance tests? Because any system in production has to deal with three things at once:

Users.
They don’t show up in neat lines. In the morning there might be 10 people online, by evening it’s 500, and on Black Friday it’s a full-blown tsunami. And not all users behave the same: one person just clicks once and leaves, another hammers the system with dozens of requests in a few minutes.
Limits.
Servers, databases, external APIs — everything has a breaking point. Capacity isn’t endless, even if you’re in the cloud. At some point, your system will hit a wall.
Time.
The longer a system runs, the more “junk” piles up: memory leaks, overstuffed caches, stuck sessions. At first it’s invisible, but after a few hours or days it starts choking performance.

📌 Performance testing basically asks: can the system handle all of this at once?
Think of it like a cafe: with just a couple of guests everything looks fine. But only when the place is packed at lunchtime do you find out if the kitchen and staff can really keep up.

Response Time and System Capacity

When you run your first tests, the big question is: how do you actually read the results?
Most folks focus only on response time. If it jumps from 1 to 2 seconds, the gut reaction is: “the system just got twice as bad.” But that’s only a symptom, not the full story.

Response time shows what a single user (or a small group) feels at a given moment.
Capacity is about how many requests/users the system as a whole can handle per second.

These two are connected, but they’re not the same thing.

Take this example:

The request queue starts to grow, the DB connection pool is close to its limit.
→ Responses get slower, but throughput can still climb or stay flat.
Now the pool limit is completely maxed out.
→ Throughput hits a hard ceiling, while response time just keeps climbing thanks to the queue.

That’s what we call a bottleneck. And in this case it’s about configuration (pool size), not the system being “out of gas” altogether.

⚡ The takeaway: don’t just stare at response time in isolation. Always check the other key metrics too:

throughput
CPU
memory
GC
queue length

The Sheep Story

This metaphor was once shared by Andrei Aparovich, and it nails what a bottleneck really is.

Picture this: a road and a bridge.

One sheep trots across and back in an hour. The farmer smiles — no problems.
But send a thousand sheep down that same road and things get messy. There’s a narrow bridge where only one sheep fits at a time. The whole flock jams up, and crossing time shoots through the roof. The issue isn’t the sheep. It’s the bridge’s limited throughput.

Now crank up the difficulty:

Rain starts pouring, the bridge gets slippery, and each sheep takes three times longer to cross. For one sheep, who cares. For a thousand, it’s chaos. The line grows, delays pile up, and suddenly the “sheep + bridge system” can’t cope.
Or imagine a tractor blocking the road, or some sheep being driven back the other way. Even a tiny hiccup now turns into a traffic jam stretching for kilometers.

🐑 The farmer eventually gets it: bottlenecks and outside factors multiply the impact of load like crazy. Systems work the same way. A tiny delay at low traffic is invisible, but under heavy load it snowballs into a serious outage.

Test Types

Before jumping into the heavy-duty stuff, you always start with the basics:

1. Smoke test 🚬

Think of it as a quick “is it alive?” check.

Do the scripts even run?
Does the service respond?
Is the infra set up right?

If yes, great. You’ve got enough confidence to move forward.

2. Baseline 📊

Baseline helps in two main cases:

In CI. Run a small test with ~10 users to see if response time got worse after a commit.
For comparison. Take today’s numbers, make a fix, and check if things improved.

Baseline is all about tracking trends.
But if you want to find the true limits of the system, baseline won’t cut it. That’s where load and stress tests come in.

3. Load test ⚡

Run the system at a fixed RPS and check it against the SLA.
This shows how it behaves under “normal” pressure.

4. Capacity / Stress test 🚦

Turn the dial up step by step until something breaks.
That’s how you find the real ceiling.

5. Longevity (Soak) test ⏳

Let the system run for hours or even days.
This is how you catch:

memory leaks,
bloated caches,
slow creep degradation that short runs will never show.

6. Scalability test 📈

Add more resources and see if it actually helps.
If performance doesn’t scale up, congrats, you’ve hit a shared limit.

Together these tests are like a ladder. One builds on top of the other, and only by using them all do you get the full picture of how the system behaves.

Colleagues’ Opinions

Ivan Zarubin pointed out that teams really need to agree on the words they use. For some folks, load and stress are the same thing. For others, they’re totally different. If you don’t align on terminology, reports get misread and confusion spreads fast.
Nadzeya Tamela stressed that requirements shouldn’t stop at functional scenarios. They also need to capture non-functional expectations: which test types are planned, which metrics actually matter, and how they’ll be checked. As she put it, the requirements should answer upfront whether a longevity test is needed, whether scalability must be validated, or if a basic load check will do.
Sergei Pashkevich added another angle — client-side metrics. Users don’t care if your API is fast if the UI still takes forever to load. In fact, the client side is often the first place you see performance degrade. Without it, you only have half the story.

Put all of this together and you get a simple takeaway: test types aren’t a strict checklist. They’re a toolbox you adapt for each project.

The essentials are:

the team speaks the same language,
requirements set the guardrails,
results cover both server-side and client-side views.

Bottleneck

At the end of the day, all these tests aim at one thing — finding the bottleneck.

And it can hide anywhere:

in the database,
in the thread pool,
in the GC,
in an external API.

Sometimes it’s something small, like a disk that just can’t keep up with log writes.
Other times it’s a big architectural issue, like a monolithic database that simply won’t scale no matter how many servers you throw at it.

The tricky part? The bottleneck isn’t always where you expect it. Sometimes the slowdown is in the frontend, not the DB. In other cases, the real pain point is a message queue.
That’s why you’ve got to look at the whole system instead of staring at one single service.

Case Studies

Let’s see how this plays out in the real world.

🏦 Banking

Symptom: in tests, login took ~2 s; in production ~12 s.
Cause: the test env didn’t generate enough concurrency, so the weak spot never showed. In prod, the crypto library started blocking under hundreds of simultaneous connections.
Fix: updated/fixed the library.
Result: login time went back to ~2 s.

🛒 E-commerce

Symptom: a 24-hour longevity run looked fine at first, but the system crashed after ~12 hours.
Cause: unclosed DB connections. Short runs never caught it.

🏬 Retail

Symptom: the service kept failing during big sales. Traffic spiked tens of thousands of requests at once and the system stopped responding.
Team’s assumption: not enough servers, let’s just add more.
Reality: perf tests showed an SQL query doing a full scan on a large table.
Fix: proper indexes + a rewritten query.
Result: capacity roughly tripled, and peak hours stopped being a drama.

🚀 Startup

Symptom: rock-solid at 10 RPS, noticeable lag at 50 RPS.
Cause: the bottleneck was a third-party API.
Fix: added caching and queues.
Result: handled 5× more load without falling over.

Theory Over Tools

Bottom line: tools alone aren’t enough.
JMeter, Gatling, k6 — they just fire load and collect numbers. The real craft is seeing the story behind the charts:

tell whether rising response time means true capacity exhaustion or a side effect (locks, config limits, etc.);
read throughput together with utilization, never in isolation;
remember: without context and know-how, charts turn into noise and create the illusion of analysis.

So don’t stop at running tests and drawing pretty dashboards, dig into the results:

connect metrics (response time ↔ throughput ↔ CPU/Memory/GC/queues);
spot patterns and validate hypotheses step by step;
separate symptoms from root causes;
and give the team clear guidance on what to fix and why.

Recommendations

To wrap things up, here are some field-tested tips:

Start small.
Kick off with dozens of users, not thousands.
Quick runs give fast feedback. You’ll know right away if scripts and the system are even alive.
Don’t ignore the client side.
A lightning-fast API means nothing if the frontend takes 5 seconds to paint the page.
Users judge the whole experience, not just the backend.
Get your terminology straight.
For some folks, load and stress are interchangeable. For others, they’re worlds apart.
If the team doesn’t align, reports will be read differently and chaos follows.
Lock in non-functional expectations early.
If requirements explicitly call for scalability checks or a 24-hour longevity run,
the team can’t hand-wave it away.
Plus, the customer knows the results match what was promised.
Set up monitoring.
Without it, you’re flying blind and only seeing surface symptoms.
Reports work best as stories: what happened, where the bottleneck is, and what to do next.
Always repeat tests.
A single run is just noise.
A series of runs — that’s actual data you can trust.

Note: this article is based on discussions of performance testing with experts — Andrei Aparovich, Ivan Zarubin, Nadzeya Tamela, and Sergei Pashkevich — as part of the podcast “Raising Performance”.