What is Performance Testing
A simple take on performance testing: why we need it, how it helps spot bottlenecks, figure out capacity, and make sure prod can handle the load.
đ
Introduction
If you ask a developer or tester, âWhatâs performance testing?â, chances are theyâll say something like: âWe throw a bunch of users at the system and see what breaks.â
Thereâs a bit of truth in that, but itâs really just the tip of the iceberg. Performance testing isnât about smashing the system for fun, itâs about digging deeper to understand:
- where the bottlenecks are,
- how the system behaves under load,
- and at what point the system hits its limits.
Because letâs be real: charts full of numbers donât mean much on their own. They donât tell you why response time is climbing, what that means for the business, or how to fix it.
At its core, performance testing is about one thing: having confidence that production can handle the real world.
Contents
- Why This Matters
- Response Time and System Capacity
- The Sheep Story
- Test Types
- Colleaguesâ Opinions
- Bottleneck
- Case Studies
- Theory Over Tools
- Recommendations
Why This Matters
So, why do we even bother with performance tests? Because any system in production has to deal with three things at once:
-
Users.
They donât show up in neat lines. In the morning there might be 10 people online, by evening itâs 500, and on Black Friday itâs a full-blown tsunami. And not all users behave the same: one person just clicks once and leaves, another hammers the system with dozens of requests in a few minutes. -
Limits.
Servers, databases, external APIs â everything has a breaking point. Capacity isnât endless, even if youâre in the cloud. At some point, your system will hit a wall. -
Time.
The longer a system runs, the more âjunkâ piles up: memory leaks, overstuffed caches, stuck sessions. At first itâs invisible, but after a few hours or days it starts choking performance.
đ Performance testing basically asks: can the system handle all of this at once?
Think of it like a cafe: with just a couple of guests everything looks fine. But only when the place is packed at lunchtime do you find out if the kitchen and staff can really keep up.
Response Time and System Capacity
When you run your first tests, the big question is: how do you actually read the results?
Most folks focus only on response time. If it jumps from 1 to 2 seconds, the gut reaction is: âthe system just got twice as bad.â But thatâs only a symptom, not the full story.
- Response time shows what a single user (or a small group) feels at a given moment.
- Capacity is about how many requests/users the system as a whole can handle per second.
These two are connected, but theyâre not the same thing.
Take this example:
-
The request queue starts to grow, the DB connection pool is close to its limit.
â Responses get slower, but throughput can still climb or stay flat. -
Now the pool limit is completely maxed out.
â Throughput hits a hard ceiling, while response time just keeps climbing thanks to the queue.
Thatâs what we call a bottleneck. And in this case itâs about configuration (pool size), not the system being âout of gasâ altogether.
⥠The takeaway: donât just stare at response time in isolation. Always check the other key metrics too:
- throughput
- CPU
- memory
- GC
- queue length
The Sheep Story
This metaphor was once shared by Andrei Aparovich, and it nails what a bottleneck really is.
Picture this: a road and a bridge.
- One sheep trots across and back in an hour. The farmer smiles â no problems.
- But send a thousand sheep down that same road and things get messy. Thereâs a narrow bridge where only one sheep fits at a time. The whole flock jams up, and crossing time shoots through the roof. The issue isnât the sheep. Itâs the bridgeâs limited throughput.
Now crank up the difficulty:
- Rain starts pouring, the bridge gets slippery, and each sheep takes three times longer to cross. For one sheep, who cares. For a thousand, itâs chaos. The line grows, delays pile up, and suddenly the âsheep + bridge systemâ canât cope.
- Or imagine a tractor blocking the road, or some sheep being driven back the other way. Even a tiny hiccup now turns into a traffic jam stretching for kilometers.
đ The farmer eventually gets it: bottlenecks and outside factors multiply the impact of load like crazy. Systems work the same way. A tiny delay at low traffic is invisible, but under heavy load it snowballs into a serious outage.
Test Types
Before jumping into the heavy-duty stuff, you always start with the basics:
1. Smoke test đŹ
Think of it as a quick âis it alive?â check.
- Do the scripts even run?
- Does the service respond?
- Is the infra set up right?
If yes, great. Youâve got enough confidence to move forward.
2. Baseline đ
Baseline helps in two main cases:
- In CI. Run a small test with ~10 users to see if response time got worse after a commit.
- For comparison. Take todayâs numbers, make a fix, and check if things improved.
Baseline is all about tracking trends.
But if you want to find the true limits of the system, baseline wonât cut it. Thatâs where load and stress tests come in.
3. Load test âĄ
Run the system at a fixed RPS and check it against the SLA.
This shows how it behaves under ânormalâ pressure.
4. Capacity / Stress test đŚ
Turn the dial up step by step until something breaks.
Thatâs how you find the real ceiling.
5. Longevity (Soak) test âł
Let the system run for hours or even days.
This is how you catch:
- memory leaks,
- bloated caches,
- slow creep degradation that short runs will never show.
6. Scalability test đ
Add more resources and see if it actually helps.
If performance doesnât scale up, congrats, youâve hit a shared limit.
Together these tests are like a ladder. One builds on top of the other, and only by using them all do you get the full picture of how the system behaves.
Colleaguesâ Opinions
-
Ivan Zarubin pointed out that teams really need to agree on the words they use. For some folks, load and stress are the same thing. For others, theyâre totally different. If you donât align on terminology, reports get misread and confusion spreads fast.
-
Nadzeya Tamela stressed that requirements shouldnât stop at functional scenarios. They also need to capture non-functional expectations: which test types are planned, which metrics actually matter, and how theyâll be checked. As she put it, the requirements should answer upfront whether a longevity test is needed, whether scalability must be validated, or if a basic load check will do.
-
Sergei Pashkevich added another angle â client-side metrics. Users donât care if your API is fast if the UI still takes forever to load. In fact, the client side is often the first place you see performance degrade. Without it, you only have half the story.
Put all of this together and you get a simple takeaway: test types arenât a strict checklist. Theyâre a toolbox you adapt for each project.
The essentials are:
- the team speaks the same language,
- requirements set the guardrails,
- results cover both server-side and client-side views.
Bottleneck
At the end of the day, all these tests aim at one thing â finding the bottleneck.
And it can hide anywhere:
- in the database,
- in the thread pool,
- in the GC,
- in an external API.
Sometimes itâs something small, like a disk that just canât keep up with log writes.
Other times itâs a big architectural issue, like a monolithic database that simply wonât scale no matter how many servers you throw at it.
The tricky part? The bottleneck isnât always where you expect it. Sometimes the slowdown is in the frontend, not the DB. In other cases, the real pain point is a message queue.
Thatâs why youâve got to look at the whole system instead of staring at one single service.
Case Studies
Letâs see how this plays out in the real world.
đŚ Banking
- Symptom: in tests, login took ~2 s; in production ~12 s.
- Cause: the test env didnât generate enough concurrency, so the weak spot never showed. In prod, the crypto library started blocking under hundreds of simultaneous connections.
- Fix: updated/fixed the library.
- Result: login time went back to ~2 s.
đ E-commerce
- Symptom: a 24-hour longevity run looked fine at first, but the system crashed after ~12 hours.
- Cause: unclosed DB connections. Short runs never caught it.
đŹ Retail
- Symptom: the service kept failing during big sales. Traffic spiked tens of thousands of requests at once and the system stopped responding.
- Teamâs assumption: not enough servers, letâs just add more.
- Reality: perf tests showed an SQL query doing a full scan on a large table.
- Fix: proper indexes + a rewritten query.
- Result: capacity roughly tripled, and peak hours stopped being a drama.
đ Startup
- Symptom: rock-solid at 10 RPS, noticeable lag at 50 RPS.
- Cause: the bottleneck was a third-party API.
- Fix: added caching and queues.
- Result: handled 5Ă more load without falling over.
Theory Over Tools
Bottom line: tools alone arenât enough.
JMeter, Gatling, k6 â they just fire load and collect numbers. The real craft is seeing the story behind the charts:
- tell whether rising response time means true capacity exhaustion or a side effect (locks, config limits, etc.);
- read throughput together with utilization, never in isolation;
- remember: without context and know-how, charts turn into noise and create the illusion of analysis.
So donât stop at running tests and drawing pretty dashboards, dig into the results:
- connect metrics (response time â throughput â CPU/Memory/GC/queues);
- spot patterns and validate hypotheses step by step;
- separate symptoms from root causes;
- and give the team clear guidance on what to fix and why.
Recommendations
To wrap things up, here are some field-tested tips:
-
Start small.
Kick off with dozens of users, not thousands.
Quick runs give fast feedback. Youâll know right away if scripts and the system are even alive. -
Donât ignore the client side.
A lightning-fast API means nothing if the frontend takes 5 seconds to paint the page.
Users judge the whole experience, not just the backend. -
Get your terminology straight.
For some folks, load and stress are interchangeable. For others, theyâre worlds apart.
If the team doesnât align, reports will be read differently and chaos follows. -
Lock in non-functional expectations early.
If requirements explicitly call for scalability checks or a 24-hour longevity run,
the team canât hand-wave it away.
Plus, the customer knows the results match what was promised. -
Set up monitoring.
Without it, youâre flying blind and only seeing surface symptoms.
Reports work best as stories: what happened, where the bottleneck is, and what to do next. -
Always repeat tests.
A single run is just noise.
A series of runs â thatâs actual data you can trust.
Note: this article is based on discussions of performance testing with experts â Andrei Aparovich, Ivan Zarubin, Nadzeya Tamela, and Sergei Pashkevich â as part of the podcast âRaising Performanceâ.