The intention of this page is to put together the Observability Signal Guidelines which will provide the required visibility into the systems without hurting the cost aspect of the solution.
Three basic observability signals that any application emits are:
- Metrics,
- Traces and
- Logs
The general question is – When to emit what signal?
The answer lies in the intent behind the signal being emitted. What do you intend to measure with the Observability signal that you are emitting?
Below is a rule of thumb which can help answer this.
Rule of thumb:
Metrics:
If you want to measure anything as count , metrics is the best way to do it. Any question that starts as “How many ….” – metrics are a good choice.
- some example measure could be :
- number of documents process
- throughput of an application
- number of errors
- kafka lag for a topic
Note: Please be careful of not including high cardinality tags on metrics.
Traces:
If you want to measure anything as an element of time, it should be a trace signal.
- some examples:
- end to end time of a document through an app (trace)
- time taken by a part of transaction (span)
- anything that needs high cardinality tags
Note: Traces are sampled. But sampling is not a bad thing. With time as a unit of measure in traces/span, trace will show when something is slow, but might miss the peak (max) values by a small margin.
Below Graph shows that sampling will not miss indicating the slowness seen in latencies.

Logs:
If you want to emit signals of high cardinality and don’t want it sampled, logs are your friends. The definition of high cardinality could be documentId, gcid etc, where we are measuring things at the smallest entity.
- some example:
- time taken for processing per request-id
- tracking the flow path of a request with attributes like request-id, attachment types etc.
Logs have a few advantages as observability signals:
- with custom-sdk (or Otel-sdk), you can emit logs with least boiler plate code.
- with logs being structured via an sdk, there is scope for building post processors on top of logs
- AI capabilities are planned on top of logs, if they are emitted via an sdk.
Emitting logs is debug mode for a long duration of time is not the definition of high-cardinality and should be avoided.
Below is a summary table on when to emit what Observability signal:
| Signal | When to use? | Retention |
|---|---|---|
| Metric | On measuring count signal | Long (few months) |
| Trace | On measuring time signal | Short (few weeks) |
| Log | On measuring high cardinality and non-sampled signal | Super short (few days) |
If you notice closely, as the attributes on an O11y-signal increases (tags/metadata associated with a signal), it becomes more useful when getting to know the state of the system. But also, at the same time, it increases the cost of that O11y-signal.
So, it is a natural effect that retention of an O11y-signal decreases as the cardinality of its metadata increase.
This has magically worked well as it doesn’t compromise on context of a O11y-signal (attributes/tags etc), at the same time takes care of cost aspect.

