by Datadog
Observability Theater
Presented by Gremlin: Measure What Matters: Proving Reliability Without Incidents
Date & Location
August 02 | 5:30 PM PDT | Observability Theater
How do you measure the incident that never happened?
Most SRE teams would agree that proactively improving reliability is the goal, but they run into two problems: how do you prove your work is having a positive impact, and how do you proactively make improvements without relying on lagging indicators like RTO and RPO? Proactive practices like Chaos Engineering are hard to measure and prove impact to the organization, meaning this important work is often deprioritized. What if there was a way to measure the work SREs do and prove it is making an impact, and do so in a way that avoids the most significant customer-facing incidents altogether?
In this talk, I’ll explain how to build a standardized view of reliability in any organization that doesn’t rely on incidents to measure progress. You’ll learn how reliability scoring helps SRE teams in organizations of all sizes address the common causes of reliability risks proactively, objectively, and scalably.