by Datadog
Scaling Up, One Network Bottleneck at a Time
Datadog processes trillions of events per day. Processing data at scale involves moving packets through a network. But sometimes the network isn’t cooperative. In this presentation, we’ll discuss the network challenges faced by one of Datadog’s larger data-processing apps, which ingests all of the metrics traffic. Its throughput was lower than expected, leading to increased cloud provider costs and spurious pager notifications.
By looking at system-level metrics and inspecting each component in the network path, we were able to find misconfigurations in our network proxy, AWS networking, Redis, and even in the Linux kernel. We will explore how we investigated and resolved these issues, and how you can apply the same methods to your production workloads.