Design Philosophy: Observability

I enjoy philosophy. Stoic philosophy in particular.
Philosophy, I think, helps us revalidate our purpose. It acts as a yard stick and makes sure that we are not moving away from our First-Principles.

Applying the same to Software Engineering, in my opinion, every team should have a “Design Philosophy”. What is that one yard stick which teams can use for making better decisions.
Infact, it is done in some forms in a few cases. Some call it Guiding-Principles. Some call it MVPs. I call it “Design Philosophy“.

The core idea is – Whenever a decision has to be made, if it is passed through this “Design Philosophy”, it should produce the same result, irrespective of who is making that decision.

As Engineer, we like equations, formulas and non-ambiguous ways of thinking. A written form of these Design Philosophies, for teams, does a lot of good in making the right decisions at a great pace. It is unfair for all Engineering teams to use the same yard stick. So people should write their own.

Below is my (opinionated) version for an Observability Engineering team.

  1. Low latency is an important features for Observability signals. The ingested observability data should be available to the users at the earliest.
  2. Observability tooling system is the torch in the dark. High reliability is a must. It cannot fail when the Platform / Application fails.
    • which means, Observability stack CANNOT fail when applications fail
    • which means, ideally, Observability stack shouldn’t be completely on the same platform as all Applications.
    • which means, Observability Vendors( buy decisions ) are not a bad choice. The choice of a vendor should be cost-effective for the Org.
    • for the O11y solutions that we decide to build in-house, Isolation is key.
  3. Our O11y stack should support – availability, reliability and performance “cost-effectively“ at scale.
  4. All the tools that we build and maintain should be vendor agnostic (sdks, collectors, refinery etc).
  5. Rate of decay of data is fast in O11y. People care more about last 1hour/1day O11y data vs last 1month data.
  6. When we opt into optimising cost in observability, it results in having more than one tool. While we can have different tools, we shouldn’t have many tools which do the same thing. Example:
    1. Metrics → Prometheus, Traces → Jaeger (Fine)
    2. Logs → ELK, Logs → Splunk (NOT-Fine)
  7. Tools change. The tools that we have today for a specific function, might change to something else in a year or two. Observability team should strive to make the change least disruptive.
  8. There is a clear view point on “what kind of observability signal, has to go where”. (Details on this here) Example:
    1. count –> metrics
    2. time –> trace
    3. high cardinality –> log

These are the elements that I use when making an Observability decision. These might vary for a different team, who might be in a different situation. But the point I am really trying to make is, have a design philosophy that will make decision making easier.

Weekly Bullet #1 – Summary for the week

Hi All !

This is an idea that I have been planning to try for quite sometime now. A summary of what happened over the week. A weekly bullet would come out every Saturday and it would cover:

  • What interesting stuff happened in Tech or Non-Tech world over the week –(Not NEWS)
  • Extracts from the books that I am reading (somethings which have hit me hard)
  • Resources Tech/Non-Tech that I might have come across.

So here is the First of something New.


Technical :

Non-Technical :

  • Podcast that I truly enjoyed. This is a 32min short extract from a full episode. If I had only 30mins this entire weekend, I would just listen to this one podcast. Enough said! “Tools of Titans: Derek Sivers Distilled (#202)”
  • Another one from Derek Sivers. He has a site for all the books that he has recommended with notes/summary/extracts from the books. If you want to pick your next book, dive in here. “Derek Sivers – Books I have read”
  • Quote I’m pondering :


“Most of the 30year olds are trying to pursue many different directions at once, but not making progress in any. They get frustrated that the world wants them to pick one thing, because they want to do them all. The solution is to think long-term. To realize that you can do one of these things for a few years, and then do another one for a few years, and then another.”

Have a great week!