A secure approach to generative AI with AWS

TutoSartup excerpt from this article:
Generative artificial intelligence (AI) is transforming the customer experience in industries across the globe… Customers are building generative AI applications using large language models (LLMs) and other foundation models (FMs), which enhance customer experiences, transform operations, improv…

Generative artificial intelligence (AI) is transforming the customer experience in industries across the globe. Customers are building generative AI applications using large language models (LLMs) and other foundation models (FMs), which enhance customer experiences, transform operations, improve employee productivity, and create new revenue channels.

FMs and the applications built around them represent extremely valuable investments for our customers. They’re often used with highly sensitive business data, like personal data, compliance data, operational data, and financial information, to optimize the model’s output. The biggest concern we hear from customers as they explore the advantages of generative AI is how to protect their highly sensitive data and investments. Because their data and model weights are incredibly valuable, customers require them to stay protected, secure, and private, whether that’s from their own administrator’s accounts, their customers, vulnerabilities in software running in their own environments, or even their cloud service provider from having access.

At AWS, our top priority is safeguarding the security and confidentiality of our customers’ workloads. We think about security across the three layers of our generative AI stack:

  • Bottom layer – Provides the tools for building and training LLMs and other FMs
  • Middle layer – Provides access to all the models along with tools you need to build and scale generative AI applications
  • Top layer – Includes applications that use LLMs and other FMs to make work stress-free by writing and debugging code, generating content, deriving insights, and taking action

Each layer is important to making generative AI pervasive and transformative.

With the AWS Nitro System, we delivered a first-of-its-kind innovation on behalf of our customers. The Nitro System is an unparalleled computing backbone for AWS, with security and performance at its core. Its specialized hardware and associated firmware are designed to enforce restrictions so that nobody, including anyone in AWS, can access your workloads or data running on your Amazon Elastic Compute Cloud (Amazon EC2) instances. Customers have benefited from this confidentiality and isolation from AWS operators on all Nitro-based EC2 instances since 2017.

By design, there is no mechanism for any Amazon employee to access a Nitro EC2 instance that customers use to run their workloads, or to access data that customers send to a machine learning (ML) accelerator or GPU. This protection applies to all Nitro-based instances, including instances with ML accelerators like AWS Inferentia and AWS Trainium, and instances with GPUs like P4, P5, G5, and G6.

The Nitro System enables Elastic Fabric Adapter (EFA), which uses the AWS-built AWS Scalable Reliable Datagram (SRD) communication protocol for cloud-scale elastic and large-scale distributed training, enabling the only always-encrypted Remote Direct Memory Access (RDMA) capable network. All communication through EFA is encrypted with VPC encryption without incurring any performance penalty.

The design of the Nitro System has been validated by the NCC Group, an independent cybersecurity firm. AWS delivers a high level of protection for customer workloads, and we believe this is the level of security and confidentiality that customers should expect from their cloud provider. This level of protection is so critical that we’ve added it in our AWS Service Terms to provide an additional assurance to all of our customers.

Innovating secure generative AI workloads using AWS industry-leading security capabilities

From day one, AWS AI infrastructure and services have had built-in security and privacy features to give you control over your data. As customers move quickly to implement generative AI in their organizations, you need to know that your data is being handled securely across the AI lifecycle, including data preparation, training, and inferencing. The security of model weights—the parameters that a model learns during training that are critical for its ability to make predictions—is paramount to protecting your data and maintaining model integrity.

This is why it is critical for AWS to continue to innovate on behalf of our customers to raise the bar on security across each layer of the generative AI stack. To do this, we believe that you must have security and confidentiality built in across each layer of the generative AI stack. You need to be able to secure the infrastructure to train LLMs and other FMs, build securely with tools to run LLMs and other FMs, and run applications that use FMs with built-in security and privacy that you can trust.

At AWS, securing AI infrastructure refers to zero access to sensitive AI data, such as AI model weights and data processed with those models, by any unauthorized person, either at the infrastructure operator or at the customer. It’s comprised of three key principles:

  1. Complete isolation of the AI data from the infrastructure operator – The infrastructure operator must have no ability to access customer content and AI data, such as AI model weights and data processed with models.
  2. Ability for customers to isolate AI data from themselves – The infrastructure must provide a mechanism to allow model weights and data to be loaded into hardware, while remaining isolated and inaccessible from customers’ own users and software.
  3. Protected infrastructure communications – The communication between devices in the ML accelerator infrastructure must be protected. All externally accessible links between the devices must be encrypted.

The Nitro System fulfills the first principle of Secure AI Infrastructure by isolating your AI data from AWS operators. The second principle provides you with a way to remove administrative access of your own users and software to your AI data. AWS not only offers you a way to achieve that, but we also made it straightforward and practical by investing in building an integrated solution between AWS Nitro Enclaves and AWS Key Management Service (AWS KMS). With Nitro Enclaves and AWS KMS, you can encrypt your sensitive AI data using keys that you own and control, store that data in a location of your choice, and securely transfer the encrypted data to an isolated compute environment for inferencing. Throughout this entire process, the sensitive AI data is encrypted and isolated from your own users and software on your EC2 instance, and AWS operators cannot access this data. Use cases that have benefited from this flow include running LLM inferencing in an enclave. Until today, Nitro Enclaves operate only in the CPU, limiting the potential for larger generative AI models and more complex processing.

We announced our plans to extend this Nitro end-to-end encrypted flow to include first-class integration with ML accelerators and GPUs, fulfilling the third principle. You will be able to decrypt and load sensitive AI data into an ML accelerator for processing while providing isolation from your own operators and verified authenticity of the application used for processing the AI data. Through the Nitro System, you can cryptographically validate your applications to AWS KMS and decrypt data only when the necessary checks pass. This enhancement allows AWS to offer end-to-end encryption for your data as it flows through generative AI workloads.

We plan to offer this end-to-end encrypted flow in the upcoming AWS-designed Trainium2 as well as GPU instances based on NVIDIA’s upcoming Blackwell architecture, which both offer secure communications between devices, the third principle of Secure AI Infrastructure. AWS and NVIDIA are collaborating closely to bring a joint solution to market, including NVIDIA’s new NVIDIA Blackwell GPU platform, which couples NVIDIA’s GB200 NVL72 solution with the Nitro System and EFA technologies to provide an industry-leading solution for securely building and deploying next-generation generative AI applications.

Advancing the future of generative AI security

Today, tens of thousands of customers are using AWS to experiment and move transformative generative AI applications into production. Generative AI workloads contain highly valuable and sensitive data that needs the level of protection from your own operators and the cloud service provider. Customers using AWS Nitro-based EC2 instances have received this level of protection and isolation from AWS operators since 2017, when we launched our innovative Nitro System.

At AWS, we’re continuing that innovation as we invest in building performant and accessible capabilities to make it practical for our customers to secure their generative AI workloads across the three layers of the generative AI stack, so that you can focus on what you do best: building and extending the uses of the generative AI to more areas. Learn more here.


About the authors

Anthony Liguori is an AWS VP and Distinguished Engineer for EC2

Colm MacCárthaigh is an AWS VP and Distinguished Engineer for EC2

A secure approach to generative AI with AWS
Author: Anthony Liguori