This page documents a few aspects of memory management on Java containers on K8s clusters.
For java containers, memory management on K8s have various factors:
- Xmx and Xms limits managed by java
- Request/limit values for the container
- HPA policies used for scaling the number of pods
Misconfigurations / misunderstanding of any of these parameters leads to OOMs of java containers on K8s clusters.
Memory management on java containers:
-XX:+UseContainerSupportis enabled by default form java 10+-XX:MaxRAMPercentageis the jvm parameter that specifies the percentage value oflimitsmemory defined on the container, that can be used by heapspace. Default value is 25%.- Example: if
-XX:MaxRAMPercentage=75, and container memory limit is3GB, then: -Xmx=75% of 3GB = 2.25GB- Important point to note:
MaxRAMPercentageis calculated onlimitsand notrequests
K8s : requests/limits:
- as shown above, the memory assignment for the container is based on the values set for
limitsconfiguration - However, for hpa to kick-in,
requestsis used for Kubernetes. - Example:
- If you configure HPA for memory utilization at 70%, it calculates usage as:
Memory Usage % = (Current Usage / Requests) * 100(1.8GB / 2GB) * 100 = 90%– results in hpa kicking-in- scaling would happen usage wrt
requestconfiguration
- If requests.memory is set low (2GB) and limits.memory is high (3GB), HPA may scale aggressively because it calculates usage based on requests, not limits.
- The only advantage of setting limit > request – is : if non-heap space growth increases, it will not crash the vm. That’s one of a case with less probability compare to heap space crash.

Ideally, to make things simpler, based on the historic usage of the application – set “request=limits” for memory usage on java container. This will simplify the Xmx, request, limits and hpa math.
For scaling apps, there is alway HPA which can increase instances based on usage.
Conclusion
- for java containers on K8s, know the memory needs of your app and set “request=limits”
- use hpa for scaling and not depend on “limits>request” for memory
- containers: run them small and run them many (via scaling based on rules)