Skip to main content
23.05.2026

JVM OOM in Kubernetes: Heap Is Not the Limit

head-image

SRE Weekly recently highlighted a familiar production trap: a Java pod dies with `OOMKilled`, but dashboards show the heap stayed below its configured maximum. The explanation is simple and easy to miss. Kubernetes enforces the container memory limit, not the JVM heap limit.

That means `-Xmx` is only one part of the budget. If you run Java in containers, your runbook needs to account for everything else the process allocates.

What Uses Memory Outside the Heap?

The Oracle Native Memory Tracking docs describe how HotSpot can report JVM native memory by subsystem. Common non-heap consumers include:

  • metaspace for class metadata
  • thread stacks for application, GC, and framework threads
  • JIT code cache
  • direct and mapped byte buffers
  • GC bookkeeping structures
  • native libraries and allocator overhead

In Kubernetes, the container memory limit is enforced by the kernel through cgroups. When total process memory crosses that limit under pressure, the pod can be killed even if heap graphs still look safe.

A Safer Sizing Rule

Avoid setting `-Xmx` equal to the pod limit. Leave room for native memory and operational variance.

```bash

Example for a 2 GiB pod limit

JAVA_TOOL_OPTIONS="-Xmx1400m -XX:MaxMetaspaceSize=256m" ```

For modern JVMs, percentage-based sizing can be easier to standardize:

```bash JAVA_TOOL_OPTIONS="-XX:MaxRAMPercentage=65 -XX:InitialRAMPercentage=50" ```

The exact number depends on thread count, framework behavior, traffic shape, and whether the app uses direct buffers heavily. Start conservative, then adjust with measurements.

Debugging an OOMKilled Java Pod

First confirm Kubernetes saw a memory kill:

```bash kubectl describe pod | grep -A5 -i "last state" kubectl top pod ```

Then compare heap metrics with process RSS. If RSS grows while heap stays flat, investigate native memory.

Enable Native Memory Tracking on a test workload or canary:

```bash JAVA_TOOL_OPTIONS="-XX:NativeMemoryTracking=summary" jcmd VM.native_memory summary scale=MB ```

NMT has overhead, so do not enable detailed tracking everywhere by default. Use it when you need evidence for a sizing fix or a suspected native leak.

Operational Tips

Alert on container memory, not only JVM heap. Heap usage is useful, but it will not catch direct buffers, metaspace growth, or thread stack explosion.

Track thread count alongside RSS. A sudden increase in threads can quietly consume hundreds of megabytes through stacks before application metrics look suspicious.

Separate request and limit decisions. A memory request is scheduling intent; a memory limit is an enforced failure boundary. If the limit is too close to steady-state RSS, normal load variance becomes an outage trigger.

Conclusion

For Java on Kubernetes, heap is not the limit. The pod limit is the limit, and the JVM shares it with a long list of native allocations. Give the heap a budget, reserve space for everything around it, and make native memory part of your incident checklist.

If you are building reliable, AI-assisted operations, Akmatori helps teams automate infrastructure workflows and incident response. Backed by Gcore, we are building tools for modern SRE and platform teams.

Automate incident response and prevent on-call burnout with AI-driven agents!