java – Engineering Blog

This page documents a few aspects of memory management on Java containers on K8s clusters.

For java containers, memory management on K8s have various factors:

Xmx and Xms limits managed by java
Request/limit values for the container
HPA policies used for scaling the number of pods

Misconfigurations / misunderstanding of any of these parameters leads to OOMs of java containers on K8s clusters.

Memory management on java containers:

-XX:+UseContainerSupport is enabled by default form java 10+
-XX:MaxRAMPercentage is the jvm parameter that specifies the percentage value of limits memory defined on the container, that can be used by heapspace. Default value is 25%.
Example: if -XX:MaxRAMPercentage=75, and container memory limit is 3GB, then:
-Xmx=75% of 3GB = 2.25GB
Important point to note: MaxRAMPercentage is calculated on limits and not requests

K8s : requests/limits:

as shown above, the memory assignment for the container is based on the values set for limits configuration
However, for hpa to kick-in, requests is used for Kubernetes.
Example:
- If you configure HPA for memory utilization at 70%, it calculates usage as:
- Memory Usage % = (Current Usage / Requests) * 100
- (1.8GB / 2GB) * 100 = 90% – results in hpa kicking-in
- scaling would happen usage wrt request configuration
If requests.memory is set low (2GB) and limits.memory is high (3GB), HPA may scale aggressively because it calculates usage based on requests, not limits.
The only advantage of setting limit > request – is : if non-heap space growth increases, it will not crash the vm. That’s one of a case with less probability compare to heap space crash.

Ideally, to make things simpler, based on the historic usage of the application – set “request=limits” for memory usage on java container. This will simplify the Xmx, request, limits and hpa math.

For scaling apps, there is alway HPA which can increase instances based on usage.

Conclusion

for java containers on K8s, know the memory needs of your app and set “request=limits”
use hpa for scaling and not depend on “limits>request” for memory
containers: run them small and run them many (via scaling based on rules)