
Trevor Bramwell
Senior Systems Engineer The Linux Foundation

Alan Sherman
Senior Cloud Operations Engineer The Linux Foundation
Date
Time
-
Location
TBD
What happens when your most critical service starts failing sporadically, taking down your entire application stack with it? You investigate, troubleshoot, and deploy a solution - only to realize you fixed the wrong problem! This session will walk through The Linux Foundation’s journey of fixing a critical issue in their infrastructure: from misidentifying the root cause and replacing an entire API gateway, to still experiencing the same failure. Trevor Bramwell and Alan Sherman will discuss how a lack of deep observability led them to months of unnecessary work. Ultimately, it was only by leveraging proper data instrumentation and tracing that the real culprit was uncovered. This session will highlight key takeaways on observability, monitoring, and ensuring the right issue is being fixed before investing in a major overhaul. Using the lessons shared by The Linux Foundation, you’ll be equipped with the skills needed to make data-driven decisions.