Know before you go – AWS re:Invent 2024 cloud resilience

TutoSartup excerpt from this article:
In this workshop, use AWS managed generative AI services in real-world scenarios to learn how to assess readiness, proactively improve your architecture, react quickly to events, troubleshoot issues, and implement effective observability practices… Service event replay: Stress-testing your archi…

With AWS re:Invent 2024 just weeks away, the excitement is building and we’re looking forward to seeing you all soon. If you’re attending re:Invent with the goal of improving your organization’s cloud resilience operations, we will be offering valuable insights, best practices, and fun activities to improve your cloud resilience expertise.

This year, we’re offering more than 100 resilience breakout sessions, workshops, chalk talks, builders’ sessions, and code talks. Find the complete list in the re:Invent 2024 session catalog and filter by “Resilience” in the area of interest field.

In this post, we highlight must-see sessions for those building resilient applications and architectures on AWS. Reserved seating is now open, so act quickly to claim your seat. Be sure to also check out the vertical-specific re:Invent guides.

Our recommendations are divided into three topics to help you choose the sessions most relevant to your business: resilience fundamentals, advanced resilience patterns, and resilience for customers operating in regulated industries.

What is cloud resilience all about?

Cloud resilience refers to the ability for an application to resist or recover from disruptions, including those related to infrastructure, dependent services, misconfigurations, transient network issues, and load spikes. Cloud resilience also plays a critical role in an organization’s broader business resilience strategy, including the ability to meet digital sovereignty requirements. Resilient applications are those built with high availability—the percentage of time the application is available for use—and also those with a disaster recovery or continuity of operations plan in place.

Resilience fundamentals

Join us as we explore the strategies, tools, and mindsets that enable organizations to thrive in the face of uncertainty. These sessions cover conceptual overviews and demos of AWS cloud resilience services.

Breakout sessions

Failing without flailing: Lessons we learned at AWS the hard way (ARC333)

At AWS, we’ve learned that building resilient services requires more than just designing for high availability. In this session, AWS operational leaders are back for more insights on how to mitigate impact when, not if, the unexpected happens. Hear a few short stories collected from 18 years of operational excellence, with practical advice on preparing for and mitigating failure.

Think big, build small: When to scale and when to simplify (ARC331)

Join this session to learn how to navigate the complexities of cloud architecture. Hear insights and guidance developed from working with successful AWS customers, including how to optimize for business value and agility. Discover the AWS approach to architectural tiers, engineering simplicity and reliability, and treating infrastructure as an investment.

Mastering resilience at every layer of the cake (ARC327)

Join this session to learn resilience at various levels, from platform to applications, using AWS services like AWS Resilience Hub, AWS Fault Injection Service, ARC, Amazon Elastic Disaster Recovery, and AWS Backup. You’ll leave with a mental model for resilience across these layers, and ready-to-use reference architectures and guidance. The session includes demos for a fun, lively experience.

Building resilient applications on AWS with Capital One (ARC334)

In this session, discover the patterns and principles of AWS resilience best practices. Then, hear Capital One showcase its next-generation design and deployment patterns that push the boundaries of resilient architectures and support its most critical business processes. Learn about the AWS services it uses, the trade-offs it must consider, and the decision matrix that guides developers to the right pattern for the right use case.

Data protection and resilience with AWS storage (STG301)

Join this session to dive deep on how AWS storage offers organizations defense-in-depth data protection and resilience for application data across recovery point and time objectives, helping mitigate risks with immutable solutions, restore testing, policy-based access controls, encryption, and auditing and reporting.

Workshops

Building, operating, and testing resilient Multi-AZ applications (ARC303)

Join this workshop to get hands-on experience building, operating, and testing a resilient Multi-AZ application.

Building resilient architectures with observability (COP308)

Explore how to use AWS services, including AWS Resilience Hub, Amazon CloudWatch, and AWS Fault Injection Service, to build resilient and reliable cloud-based applications.

Advanced resilience patterns

Building resilient and reliable applications in the cloud is critical for organizations running mission-critical workloads. Unexpected outages, latency spikes, or performance issues can have severe business impact. The sessions and workshops in this track explore advanced techniques and tools to help you proactively identify and address resilience weaknesses in your systems. Learn how to use chaos engineering, multi-Region architectures, and the latest AWS services and best practices to enhance the resilience and operational excellence of your cloud applications.

Breakout sessions

Chaos engineering: A proactive approach to system resilience (ARC326)

This session demonstrates the benefits of chaos engineering in action. Gain insights from BMW Group’s transformative journey, learning key lessons on scaling chaos engineering across the organization, and how BMW Group conducts large-scale chaos experiments in production, uncovering issues and fostering a culture of greater resilience and continuous improvement.

Try again: The tools and techniques behind resilient systems (ARC403)

Grand architectural theories are nice, but what makes systems resilient is in the details. Marc Brooker, VP and distinguished engineer, looks at some of the resiliency tools and techniques AWS uses in its systems. Marc rethinks, retries, breaks open circuit breakers, decodes erasure coding, and tackles the tail. Learn about formal methods and simulation, and how these tools help build faster code, faster.

Multi-Region or single Region? Considerations and architectures (ARC309)

Watch experts walk through and whiteboard architectures that take advantage of AWS services that support multi-Region capabilities, and discuss what a failover scenario would look like in real life. Leave with an understanding of what it takes to run a multi-Region architecture on AWS.

Best practices for creating multi-Region architectures on AWS (ARC323)

In this session, learn the two critical areas you’ll need to consider. First, explore different failover strategies and the trade-offs between them. Then, learn how to make the decision to initiate a cross-Region failover as well as what goes into the process. Lastly, hear from Samsung Account about their multi-Region application and how they think about these two critical areas.

Workshops

Chaos engineering workshop (ARC322)

This workshop introduces AWS Fault Injection Service for running controlled resilience experiments to improve application performance, observability, and resilience. You must bring your laptop to participate.

Gen AI resilience: Chaos engineering with AWS Fault Injection Service (ARC305)

Learn how to construct a useful hypothesis backlog for generative AI applications and how to use AWS Fault Injection Service to run those experiments. You must bring your laptop to participate.

Building operational resilience in workloads using generative AI (SUP401)

Building operational resilience requires proactive identification and mitigation of risks. In this workshop, use AWS managed generative AI services in real-world scenarios to learn how to assess readiness, proactively improve your architecture, react quickly to events, troubleshoot issues, and implement effective observability practices. Also use AWS Countdown and the AWS Well-Architected Framework as the entry point reference frameworks to use generative AI services for operation. Through hands-on activities, learn strategies for debugging issues, detecting anomalies and incidents, and optimizing architectures to improve the resilience of your workloads. You must bring your laptop to participate.

Resilience for customers operating in regulated industries

In regulated industries like finance, healthcare, and telecom, resilient architecture is critical for compliance, security, and operational continuity. These sectors face strict regulations that demand robust data protection, disaster recovery, and uptime guarantees. A resilient architecture helps organizations maintain service availability, minimize downtime, and recover quickly from disruptions, safeguarding sensitive data and avoiding regulatory breaches. It also enables businesses to adapt to evolving regulations while delivering secure, uninterrupted services.

Breakout sessions

Fidelity Investments: Building for mission-critical resilience (FSI318)

This session explores the transformation of Fidelity Investments’s trade processing platform on AWS and the critical role resiliency plays in preserving operational integrity.

Service event replay: Stress-testing your architecture’s resilience (FSI314)

Learn how to assess the resiliency of your own architectures and develop strategies to strengthen your response and recovery capabilities.

Workshops

Scaling multi-tenant SaaS with a cell-based architecture (ARC402)

In this workshop, see how cell-based architectures provide you with new ways to group, deploy, scale, and operate your multi-tenant workloads. Also see how this approach influences the tiering, scaling, and resilience profile of your SaaS architecture. You must bring your laptop to participate.

Advanced cross-Region DR patterns on AWS (ARC401)

Join this hands-on workshop to explore a resilient, cloud-centered architecture that surpasses the stringent availability and recovery regulations for financial markets utility providers. You must bring your laptop to participate.

Meet experts at the AWS Cloud Resilience kiosk

Throughout the re:Invent week, if you have any questions or suggestions for the AWS Cloud Resilience team, drop by the Cloud Resilience kiosk at the AWS Village in the 2024 re:Invent Expo (the Venetian).

This post was copyedited for grammar, spelling, capitalization, punctuation, terminology, and legal issues. Other important issues are noted in comments, and you should consider revising the content accordingly before publication.

Know before you go – AWS re:Invent 2024 cloud resilience
Author: Shllomi Ezra