Managing Monitors at Scale: Strategies to Optimize Alert Systems for Multi-team Environments

Running a platform effectively requires the coordination of multiple teams, each with distinct and interconnected responsibilities, from those handling core infrastructure and individual service deployments to groups responsible for shared components like database management and authentication systems. Navigating these complex interdependencies can be challenging, particularly when trying to tailor alerts to be meaningful and actionable for each team, accurately assign incident ownership, standardize practices across teams, and scale alerting systems effectively.This Theater Session explores strategic approaches to organizing monitor collections in a way that empowers diverse teams to maintain system integrity and quickly alert the appropriate responders when issues arise. This section will cover Monitor & Notification Grouping, Tag Policy, Monitor Quality and Access Controls.

