LIVE
2024 DASH Breakout 2
Breakout 2 Agenda
1:00PM - 1:50 PM EDT
Navigating Cost Pressures at Scale with Rivian’s Connected Vehicle Platform
Beau Christensen
Rivian
1:00PM - 1:50 PM EDT
Navigating Cost Pressures at Scale with Rivian’s Connected Vehicle Platform
Beau Christensen
Sr Engineering Manager, Data Platform & SRE | Rivian
Electric vehicles are increasingly becoming data-centric platforms, and Rivian's integration of advanced digital capabilities with robust physical systems has established their fleet as sophisticated mobile data centers. With automation as a core principle of modern Site Reliability Engineering, cost-cutting is crucial as companies expand their cloud footprint while remaining fiscally responsible.
In this session, led by Rivian SRE Beau Christensen, we’ll explore the unintended outcomes of these proactive automation and cost-cutting strategies within connected vehicle ecosystems, shedding light on how these approaches, while initially attractive, may inadvertently add complexities that challenge system resilience.
We will explore the related scaling challenges in cloud ingestion, the complexities of cloud-to-vehicle connectivity, and the dynamics of managing customer interactions through mobile data access, all within the framework of maintaining cost-efficiency.
Attendees will leave with a deeper understanding of how to balance cost management with the need for robust, long-lasting technology solutions in the automotive industry, gaining strategic approaches to avoid potential pitfalls in the pursuit of innovation and cost reduction.
1:00PM - 1:50 PM EDT
Navigating Cost Pressures at Scale with Rivian’s Connected Vehicle Platform
Electric vehicles are increasingly becoming data-centric platforms, and Rivian's integration of advanced digital capabilities with robust physical systems has established their fleet as sophisticated mobile data centers. With automation as a core principle of modern Site Reliability Engineering, cost-cutting is crucial as companies expand their cloud footprint while remaining fiscally responsible.
In this session, led by Rivian SRE Beau Christensen, we’ll explore the unintended outcomes of these proactive automation and cost-cutting strategies within connected vehicle ecosystems, shedding light on how these approaches, while initially attractive, may inadvertently add complexities that challenge system resilience.
We will explore the related scaling challenges in cloud ingestion, the complexities of cloud-to-vehicle connectivity, and the dynamics of managing customer interactions through mobile data access, all within the framework of maintaining cost-efficiency.
Attendees will leave with a deeper understanding of how to balance cost management with the need for robust, long-lasting technology solutions in the automotive industry, gaining strategic approaches to avoid potential pitfalls in the pursuit of innovation and cost reduction.
Beau Christensen
Sr Engineering Manager, Data Platform & SRE | Rivian
2:00PM - 2:40 PM EDT
How Snowflake Optimized Its New UI
James Lai
Snowflake
2:00PM - 2:40 PM EDT
How Snowflake Optimized Its New UI
James Lai
Senior Software Engineer | Snowflake
Snowflake built a completely new UI, with amazing new capabilities, but early feedback revealed a concern that it felt slow to customers. To address this issue, we had to deep dive into specifically what customers meant by understanding which moments felt slow to ensure our efforts would be spent in the right places.
What we learned surprised us. We’ll discuss all the various meanings and moments customers meant behind the word “slow.” We’ll also share some tricks we learned along the way for collecting feedback from customers. And finally, we’ll share how we individually addressed each and every one of the different moments customers were describing, with each implementation often having wildly different solutions.
2:00PM - 2:40 PM EDT
How Snowflake Optimized Its New UI
Snowflake built a completely new UI, with amazing new capabilities, but early feedback revealed a concern that it felt slow to customers. To address this issue, we had to deep dive into specifically what customers meant by understanding which moments felt slow to ensure our efforts would be spent in the right places.
What we learned surprised us. We’ll discuss all the various meanings and moments customers meant behind the word “slow.” We’ll also share some tricks we learned along the way for collecting feedback from customers. And finally, we’ll share how we individually addressed each and every one of the different moments customers were describing, with each implementation often having wildly different solutions.
James Lai
Senior Software Engineer | Snowflake
2:50PM - 3:30 PM EDT
Scaling ML Serving to 1000s of Models
Gerard Casas Saez
Cash App
2:50PM - 3:30 PM EDT
Scaling ML Serving to 1000s of Models
Gerard Casas Saez
Senior ML Engineer | Cash App
Join the Cash App engineering team as we discuss effective strategies for scaling ML serving solutions to manage thousands of models efficiently. In this talk, Gerard Casas Saez (Senior Machine Learning Engineer) shares how Cash App optimized their platform, focusing on ONNX model performance, hot container replacements, and automatic, streamlined model deployments. Learn about the enhancements made to AWS Sagemaker Multi-Model Endpoints, including zero downtime upgrades and process improvements that accelerate productionization through a custom Python client and robust approval workflows.
Gerard will also discuss Cash App’s approach to managing AWS Sagemaker endpoints as a unified team, highlighting techniques to minimize on-call disruptions and manage services without becoming a bottleneck. Additionally, learn about insights into the future of their platform, including plans for hosting large language models and ongoing optimization efforts.
Attendees will leave with a clear understanding of best practices for ONNX serving, strategies for reducing deployment times, and techniques to enhance monitoring and stability. This session is essential for professionals looking to scale their ML operations effectively in a cost-sensitive and high-demand environment.
2:50PM - 3:30 PM EDT
Scaling ML Serving to 1000s of Models
Join the Cash App engineering team as we discuss effective strategies for scaling ML serving solutions to manage thousands of models efficiently. In this talk, Gerard Casas Saez (Senior Machine Learning Engineer) shares how Cash App optimized their platform, focusing on ONNX model performance, hot container replacements, and automatic, streamlined model deployments. Learn about the enhancements made to AWS Sagemaker Multi-Model Endpoints, including zero downtime upgrades and process improvements that accelerate productionization through a custom Python client and robust approval workflows.
Gerard will also discuss Cash App’s approach to managing AWS Sagemaker endpoints as a unified team, highlighting techniques to minimize on-call disruptions and manage services without becoming a bottleneck. Additionally, learn about insights into the future of their platform, including plans for hosting large language models and ongoing optimization efforts.
Attendees will leave with a clear understanding of best practices for ONNX serving, strategies for reducing deployment times, and techniques to enhance monitoring and stability. This session is essential for professionals looking to scale their ML operations effectively in a cost-sensitive and high-demand environment.
Gerard Casas Saez
Senior ML Engineer | Cash App
3:40PM - 4:30 PM EDT
Making Observability Ownership Practical for Every Developer
Gabby Luna
Greenlight
Connor Teague
Greenlight
3:40PM - 4:30 PM EDT
Making Observability Ownership Practical for Every Developer
Gabby Luna
Software Engineer | Greenlight
Connor Teague
Staff Site Reliability Engineer | Greenlight
Greenlight is a fintech company that empowers parents to raise financially smart kids by teaching them how to responsibly earn, spend, save, invest, and build credit with their money. Recently, observability practices diverged between our SREs and engineers, and they began operating without a common monitoring strategy, leading to a slowdown in feature development. To address this, we built a tool called Crosswalk that makes defining resources like SLOs, monitors, and dashboards as simple as configuring a Typescript class that links these services to each feature. This in-house library empowers our teams to create, manage, and own their own monitors in the same repositories they’re building new services and components.
In this talk, we’ll demonstrate how you can effectively construct your own centralized, cross-team observability solution—and walk you through how we leveraged some industry best practices to craft a tool that all of our developers could own (and that currently supports 130+ microservices in more than 12 operational environments).
3:40PM - 4:30 PM EDT
Making Observability Ownership Practical for Every Developer
Greenlight is a fintech company that empowers parents to raise financially smart kids by teaching them how to responsibly earn, spend, save, invest, and build credit with their money. Recently, observability practices diverged between our SREs and engineers, and they began operating without a common monitoring strategy, leading to a slowdown in feature development. To address this, we built a tool called Crosswalk that makes defining resources like SLOs, monitors, and dashboards as simple as configuring a Typescript class that links these services to each feature. This in-house library empowers our teams to create, manage, and own their own monitors in the same repositories they’re building new services and components.
In this talk, we’ll demonstrate how you can effectively construct your own centralized, cross-team observability solution—and walk you through how we leveraged some industry best practices to craft a tool that all of our developers could own (and that currently supports 130+ microservices in more than 12 operational environments).
Gabby Luna
Software Engineer | Greenlight
Connor Teague
Staff Site Reliability Engineer | Greenlight