SRE Focus Area

Observability

Observability unveils the heartbeat of software in production, and Site Reliability Engineers (SREs) help orchestrate its vital rhythm.

Explore how to do this data-rich practice well.


  • What is observability and OpenTelemetry?
  • The cardinality conundrum in observability
  • How to handle alert fatigue and more
Explore observability topics

New to observability or need a refresher?

Introductory topics

Learn essential concepts of this evolution from monitoring (1st part coming early Feb ’24)


Let’s unpack the hottest project in observability (in final editing)


We go deeper into the 3 golden pillars of observability (in editing)


Learn about this critical part in monitoring your systems (draft done)



Listen to our conversation with Adriana Villela, OpenTelemetry Working Group Co-Lead:

Exploring the risks within your observability work

Problem Explorer

Deepdiving the mechanics of high cardinality and how “bad” it is


How to Cut Alert Fatigue Risk

(in editing)


How to solve 3 Observability Data Flow issues


How to solve poor data quality in observability


Troubleshoot Common Log, Metrics, and Tracing Issues


Where does observability add value?

Capability Reviews

(coming soon)


Rundown of AWS internal tools for observability

(coming soon)