Prometheus: Static/Dynamic scraping on EKS

This note is a mental model for how Prometheus discovers and scrapes metrics in Kubernetes.

The lens I want to keep throughout is:

  1. Where will the scrape config file sit?
    (Prometheus repo vs application repo)
  2. In which namespace will the serviceMonitor sit?
    (and how Prometheus finds it)

At a high level there are two ways to tell Prometheus about a /metrics endpoint:

  1. Static via in the Prometheus config file.
  2. Dynamic via (CRD from Prometheus Operator) with label‑based discovery.
  1. Approach 1: Static scrape_config:
    1. How question:
    2. Where question:
  2. Approach 2: Dynamic scraping with ServiceMonitor
    1. How question:
    2. Where question:
      1. (1) Watch service monitor in all Namespaces.
      2. (2) Watch only specific Namespaces
      3. (3) Watch only Prometheus Namespace.
    3. Below is the workflow of Dynamic, label-based Discovery of a scrape endpoint:

Approach 1: Static scrape_config:

This is the traditional way: you manually configure a scrape job in prometheus.yaml.
The natural questions here are:

  • How do I configure the scrape_config job?
  • Where will that config YAML file sit? (Prometheus repo or application repo?)

How question:

You define a scrape_configs job in the Prometheus config:

scrape_configs:
  - job_name: 'otel-collector'
    scrape_interval: 120s
    static_configs:
      - targets: ['otel-collector:9090']
    metrics_path: '/metrics'

Prometheus will then call: GET http://otel-collector:9090/metrics on configured interval.
An example in otel-contrib repo – here

Where question:

In this model the scrape config is centralized in the Prometheus deployment repo.

prometheus-deployment-repo/            #Platform team owns
├── prometheus.yaml                    # Main config with scrape_configs
├── prometheus-deployment.yaml
└── prometheus-configmap.yaml

otel-collector-repo/                   # App team owns
├── deployment.yaml
├── service.yaml
└── otel-collector.yaml
# NO prometheus.yaml here

This is a very manual contract between the application and the Prometheus and potentially has problems with Day1 and Day2 operations, as the application evolved.
Everytime there is a change in applicationName or Port, the same changes will have to put on the Prometheus repo as well. Failure of which will lead into scrape failures.

Approach 2: Dynamic scraping with ServiceMonitor

Rather than manual, there is also an option of Dynamic discovery of a new endpoint on EKS that prometheus can figure out it has to scraped.
This is done via ServiceMonitor which is a CRD from Prometheus-operator. Details here

Once serviceMonitor CRD is present in the cluster, the Prometheus-operator then:

  • Watches ServiceMonitor resources in the EKS cluster.
  • Translates them into Prometheus scrape configs.
  • Updates the Prometheus server configuration automatically.
    Note that, if ServiceMonitor CRD is present, other tools can just set the variable true for serviceMonitor in their helm charts and start getting scraped by prometheus.

Looking through serviceMonitor from the lens of :
How to configure serviceMonitor for my app ?
Where (which namespace) should I push my serviceMonitor to ?

How question:

Configure a kind: ServiceMonitor kubernetes resource in the application repo. Example of the same below.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: otel-collector
  namespace: otel-demo
  labels:
    prometheus: fabric        # Must match Prometheus selector
spec:
  selector:
    matchLabels:
      app: otel-collector     # Selects Services with this label
  namespaceSelector:
    matchNames:
      - otel-demo
  endpoints:
    - port: metrics           # Port NAME (not number!)
      interval: 120s

So the How for dynamic scraping is: “Define a ServiceMonitor next to your Deployment and Service in your app repo.”

Where question:

With static scrape_config, the “where” question was: which repo owns the config file?With ServiceMonitor, the YAML lives in the app repo, so the “where” question shifts to:

In which Kubernetes namespace should I deploy the serviceMonitor so that Prometheus discovers it?

This is controlled by the serviceMonitorNamespaceSelector field in the Prometheus CRD.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: platform-prometheus
  namespace: prometheus-platform
spec:
  serviceMonitorSelector:
    matchLabels:
      prometheus: platform       # only watch SMs with this label
  serviceMonitorNamespaceSelector: {}

Prometheus can be setup in 3 different ways to do this:

(1) Watch service monitor in all Namespaces.

Prometheus Operator watches every namespace for ServiceMonitors with label prometheus: fabric in below example.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: fabric-prometheus
  namespace: prometheus-fabric
spec:
  serviceMonitorSelector:
    matchLabels:
      prometheus: fabric
  serviceMonitorNamespaceSelector:
    {} # Empty = all namespaces
(2) Watch only specific Namespaces

Only ServiceMonitors in namespaces labeled monitoring: enabled are picked up in below example.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: fabric-prometheus
  namespace: prometheus-fabric
spec:
  serviceMonitorSelector:
    matchLabels:
      prometheus: fabric
  serviceMonitorNamespaceSelector:
    matchLabels:
      monitoring: enabled  # Only namespaces with this label
(3) Watch only Prometheus Namespace.

ServiceMonitors must be in prometheus-fabric namespace (centralized model) in below example

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: fabric-prometheus
  namespace: prometheus-fabric
spec:
  serviceMonitorSelector:
    matchLabels:
      prometheus: fabric
  serviceMonitorNamespaceSelector:
    matchNames:
      - prometheus-fabric  # Only this namespace

Below is the workflow of Dynamic, label-based Discovery of a scrape endpoint:

In below workflow example, prometheus is deployed in prometheus-namespace and the application is deployed in otel-demo.
The workflow shows tying up between :
Promethues –> ServiceMonitor –> Service –> Pods

┌─────────────────────────────────────────────────────────────┐
│ 1. Prometheus (namespace: prometheus-namespace)               │
│                                                             │
│    serviceMonitorSelector:                                  │
│      matchLabels:                                           │
│        prometheus: fabric  ◄────────────────────┐          │
│                                                  │          │
│    serviceMonitorNamespaceSelector: {}          │          │
│    (empty = watch all namespaces)               │          │
└──────────────────────────────────────────────────┼──────────┘
                                                   │
                    ┌──────────────────────────────┘
                    │ Operator watches all namespaces
                    ▼
┌────────────────────────────────────────────────────────────┐
│ 2. ServiceMonitor (namespace: otel-demo)                   │
│                                                            │
│    metadata:                                               │
│      labels:                                               │
│        prometheus: fabric  ◄─────────────────┐             │
│    spec:                                     │             │
│      selector:                               │             │
│        matchLabels:                          │             │
│          app: otel-collector  ◄──────────┐   │             │
└───────────────────────────────────────────┼───┼────────────┘
                                            │   │
                    ┌───────────────────────┘   │
                    │ Selects Services          │
                    ▼                           │
┌────────────────────────────────────────────────────────────┐
│ 3. Service (namespace: otel-demo)                          │
│                                                            │
│    metadata:                                               │
│      labels:                                               │
│        app: otel-collector  ◄────────────────┐             │
│    spec:                                     │             │
│      selector:                               │             │
│        app: otel-collector  ◄────────────┐   │             │
│      ports:                              │   │             │
│        - name: metrics  ◄────────────┐   │   │             │
│          port: 9090                  │   │   │             │
└──────────────────────────────────────┼───┼───┼─────────────┘
                                       │   │   │
                    ┌──────────────────┘   │   │
                    │ Port name match      │   │
                    │                      │   │
                    │  ┌───────────────────┘   │
                    │  │ Selects Pods          │
                    ▼  ▼                       │
┌────────────────────────────────────────────────────────────┐
│ 4. Pod (namespace: otel-demo)                              │
│                                                            │
│    metadata:                                               │
│      labels:                                               │
│        app: otel-collector  ◄────────────────┐             │
│    spec:                                     │             │
│      containers:                             │             │
│        - ports:                              │             │
│            - name: metrics                   │             │
│              containerPort: 9090  ◄──────────┼─────────────┤
└──────────────────────────────────────────────┼─────────────┘
                                               │
                    ┌──────────────────────────┘
                    │ Prometheus scrapes
                    ▼
              GET http://10.0.1.50:9090/metrics

Memory management : Java containers on K8s

This page documents a few aspects of memory management on Java containers on K8s clusters.

For java containers, memory management on K8s have various factors:

  • Xmx and Xms limits managed by java
  • Request/limit values for the container
  • HPA policies used for scaling the number of pods

Misconfigurations / misunderstanding of any of these parameters leads to OOMs of java containers on K8s clusters.

Memory management on java containers:

  • -XX:+UseContainerSupport is enabled by default form java 10+
  • -XX:MaxRAMPercentage is the jvm parameter that specifies the percentage value of limits memory defined on the container, that can be used by heapspace. Default value is 25%.
  • Example: if -XX:MaxRAMPercentage=75, and container memory limit is 3GB, then:
  • -Xmx=75% of 3GB = 2.25GB
  • Important point to note: MaxRAMPercentage is calculated on limits and not requests

K8s : requests/limits:

  • as shown above, the memory assignment for the container is based on the values set for limits configuration
  • However, for hpa to kick-in, requests is used for Kubernetes.
  • Example:
    • If you configure HPA for memory utilization at 70%, it calculates usage as:
    • Memory Usage % = (Current Usage / Requests) * 100
    • (1.8GB / 2GB) * 100 = 90% – results in hpa kicking-in
    • scaling would happen usage wrt request configuration
  • If requests.memory is set low (2GB) and limits.memory is high (3GB), HPA may scale aggressively because it calculates usage based on requests, not limits.
  • The only advantage of setting limit > request – is : if non-heap space growth increases, it will not crash the vm. That’s one of a case with less probability compare to heap space crash.

Ideally, to make things simpler, based on the historic usage of the application – set “request=limits” for memory usage on java container. This will simplify the Xmx, request, limits and hpa math.

For scaling apps, there is alway HPA which can increase instances based on usage.


Conclusion

  • for java containers on K8s, know the memory needs of your app and set “request=limits”
  • use hpa for scaling and not depend on “limits>request” for memory
  • containers: run them small and run them many (via scaling based on rules)