This is a story about a Java thread leak.
But more importantly, it is a story about how performance problems rarely start where we first notice them.
The project was a large enterprise system running in Kubernetes. The goal of the test was simple and reasonable: make sure the service could handle long and heavy load without degrading over time.
During one of the runs, a service started crashing with a fatal Java Runtime Environment error. Kubernetes restarted the pod, the system recovered, and after some time the same thing happened again.
At first glance, it looked like a classic memory leak. But the heap was clean. That detail already told me this was not going to be a quick fix.
When the obvious explanation is wrong
CPU usage was stable. Heap usage looked fine. And yet memory kept growing until the container hit its limit.
To remove infrastructure from the picture, I ran the same service locally in Docker and repeated the test. The behavior was identical. Memory grew steadily. CPU stayed calm. At this point, the conclusion was simple: the problem was not in the platform. It was inside the application.
The JVM logs confirmed it. Native memory allocation failed. The operating system could not allocate stack space for new threads. In plain terms, the system was creating more and more threads and never letting them go.
The real cause
The root cause was easy to spot once I knew where to look.
Inside a frequently called method, a new ExecutorService was created on every request. The code looked clean and modern. CompletableFuture. Async calls. A familiar pattern. But there was one critical detail: the thread pool was never closed.
Each request created a new set of threads. Those threads stayed alive. Over time, native memory filled up until the JVM could not continue. This was not a rare edge case. It was a design decision that quietly turned into a failure under sustained load.
A fix that looks correct, and the one that actually works
The first reaction was predictable. Add a try/finally block. Shut down the executor after use. The memory leak disappears.
And yet this fix was still wrong.
Creating and destroying a thread pool for every request defeats the purpose of using a thread pool at all. Thread creation is expensive. Under load, this approach increases overhead, reduces throughput, and introduces new risks. The leak was gone, but the design problem remained.
The correct solution was not about closing resources more carefully. It was about changing how the system was designed. The thread pool had to be created once, at application startup, and reused across requests. Its size and lifecycle had to be controlled explicitly.
After refactoring, the service stopped creating threads endlessly. Memory stabilized. The system became predictable under long load. Nothing magical happened. We simply aligned the implementation with how thread pools are meant to be used.
What this bug really shows
This was not just a bug caused by a missing shutdown call.
It was a reminder that many performance problems are architectural. They hide behind clean code and good intentions, and they only appear when a system is observed under realistic and sustained pressure.
A short functional test would never catch this. A quick benchmark would miss it. Only a long-running test made the problem visible. This is why I treat performance findings as signals about system design, not just issues to patch.
A thread leak is rarely just a leak. It is often a sign that resource lifecycles are not clearly owned, that execution decisions were made without thinking about long-term behavior, and that the system was never validated under the conditions it will face in production.
Fixing the symptom is easy. Understanding why the system failed is the real work. And that understanding matters far more than the fix itself.
This is why I no longer treat thread leaks as just bugs to fix. I treat them as signals that something in the system design was never fully thought through.