From Dashboard Soup to Observability Lasagna: Building Better Layers

"I'm Martha. I'm a product engineer at a company called incident.io. We build a product that handles your end-to-end incident management. That means your alerts firing, paging you, all the way through to writing a postmortem. I work across the stack, but I focus a lot on the reliability of our product and the observability that enables that. Today you're going to leave with a process to unsoup your dashboards, which I promise is a very technical term."

"First, we're going to start with a story. Our story starts in early 2024. We'd just finished building an on-call product, so something that handles your alerts and pages you, wakes you up in the middle of the night when your software goes wrong. I'm going to use on-call as an example throughout this talk because I'm sure most of you know what it means to be paged."

The team transformed a chaotic collection of dashboards into a layered observability stack to make incident response reliable and actionable. The product covers end-to-end incident management: alerting, paging, and postmortems, with reliability and observability prioritized. Facing a tight release timeline for an on-call product, the team emphasized trust in paging and required structured monitoring so failures would surface reliably. The approach centers on unsouping dashboards into clear layers, guiding engineers through a reproducible debugging process, and applying practical technical practices to ensure smooth operation and maintain high availability for critical on-call flows.

#observability #incident-management #on-call-reliability #dashboards

Read at InfoQ

Unable to calculate read time

Collection

[

...

]

From Dashboard Soup to Observability Lasagna: Building Better LayersFrom Dashboard Soup to Observability Lasagna: Building Better Layers Briefly

From Dashboard Soup to Observability Lasagna: Building Better Layers
From Dashboard Soup to Observability Lasagna: Building Better Layers
Briefly