Date
Time
-
Location
TBD
Modern engineering teams face increasing complexity in managing errors across distributed systems, particularly in monorepo environments with multiple microservices. This session introduces two complementary approaches that significantly reduce operational overhead and improve incident response times.
First, I'll demonstrate how we implemented an automated code generation pipeline using structured definition files and code generation tools to centralize error management. By defining error codes, detailed alerts, and runbook procedures in a single source file, we generate application-level error code implementations and Terraform-managed Datadog monitoring alerts simultaneously. This approach eliminates the traditional disconnect between code, alerts, and documentation, ensuring consistency across these critical components.
Second, I'll share how we leverage Datadog's Reference Tables to intelligently route alerts in our monorepo environment. With dozens of microservices maintained by different teams, we needed a solution that would maintain central error visibility while ensuring proper ownership and accountability. Our approach uses Reference Tables to map service names to Slack user groups, enabling automatic routing of alerts to the right teams without creating siloed monitoring channels.
Throughout the presentation, I'll provide practical examples of how these techniques have reduced our mean time to resolution, decreased alert fatigue, and improved developer experience. Attendees will learn how to implement these patterns in their own environments, with specific code examples and configuration strategies that work at scale.