Using Amazon GuardDuty Malware Protection to scan uploads to Amazon S3

TutoSartup excerpt from this article:
Amazon Simple Storage Service (Amazon S3) is a widely used object storage service known for its scalability, availability, durability, security, and performance… This new feature provides malicious object scanning for objects uploaded to S3 buckets, using multiple AWS-developed and industry-leadin…

Amazon Simple Storage Service (Amazon S3) is a widely used object storage service known for its scalability, availability, durability, security, and performance. When sharing data between organizations, customers need to treat incoming data as untrusted and assess it for malicious files before ingesting it into their downstream processes. This traditionally requires setting up secure staging buckets, deploying third-party anti-virus and anti-malware scanning software, and managing a complex data pipeline and processing architecture.

To address the need for malware protection in Amazon S3, Amazon Web Services (AWS) has launched Amazon GuardDuty Malware Protection for Amazon S3. This new feature provides malicious object scanning for objects uploaded to S3 buckets, using multiple AWS-developed and industry-leading third-party malware scanning engines. It eliminates the need for customers to manage their own isolated data pipelines, compute infrastructure, and anti-virus software across accounts and AWS Regions, providing malware detection without compromising the scale, latency, and resiliency of S3 usage.

In this blog post, we share a solution that uses Amazon EventBridge, AWS Lambda, and Amazon S3 to copy scanned S3 objects to a destination S3 bucket. EventBridge is a serverless event bus that you can use to build event-driven architectures and automate your business workflows. In this solution, we allow events to be invoked from an object that is being placed in an S3 bucket. The events can be processed by a serverless function in Lambda to invoke a malware scan. We then show you how to extend this solution for other use cases specific to your organization.

Feature overview

GuardDuty Malware Protection for Amazon S3 provides a malware and anti-virus detection service for new objects uploaded to an S3 bucket. Malware Protection for S3 is enabled from within the AWS Management Console for GuardDuty and GuardDuty threat detection is not required to be enabled to use this feature. If GuardDuty threat detection is enabled, security findings for detected malware are also sent to GuardDuty. This allows customer development or application teams and security teams to work together and oversee malware protection for S3 buckets throughout the organization.

When your AWS account has GuardDuty enabled in an AWS Region, your account is associated to a unique regional entity called a detector ID. All findings that GuardDuty generates and API operations that are performed are associated with this detector ID. If you don’t want to use GuardDuty with your AWS account, Malware Protection for S3 is available as an independent feature. Used independently, Malware Protection for S3 will not create an associated detector ID.

When a malware scan identifies a potentially malicious object and you don’t have a detector ID, no GuardDuty finding will be generated in your AWS account. GuardDuty will publish the malware scan results to your default EventBridge event bus and metrics to an Amazon CloudWatch namespace for you to use for automating additional tasks.

GuardDuty manages error handling and reprocessing of event creation and publication as needed to make sure that each object is properly evaluated before being accessed by downstream resources. GuardDuty supports configuring Amazon S3 object tagging actions to be performed throughout the process.

Figure 1 shows the high-level overview of the S3 object scanning process.

Figure 1: S3 object scanning process

Figure 1: S3 object scanning process

The object scanning process is the following:

  1. An object is uploaded to an S3 bucket that has been configured for malware detection. If the object is uploaded as a multi-part upload, then a new object notification will be generated on completion of the upload.
  2. The malware scan service receives a notification that a new object has been detected in the bucket.
  3. The malware scan service downloads the object by using AWS PrivateLink. This will be automatically created when malware detection is enabled on an S3 bucket. No additional configuration is required.
  4. The malware detection service then reads, decrypts, and scans this object in an isolated VPC with no internet access within the GuardDuty service account. Encryption at rest is used for customer data that is scanned during this process. After the malware detection scan is complete, the object is deleted from the malware scanning environment.
  5. The malware scan result event is sent to the EventBridge default event bus in your AWS account and Region where malware detection has been enabled. When malware is detected, an EventBridge notification is generated that includes details of which S3 object was flagged as malicious and supporting information such as the malware variant and known use cases for the malicious software.
  6. Scan metrics such as number of objects scanned and bytes scanned are sent to Amazon CloudWatch.
  7. If malware is detected, the service sends a finding to the GuardDuty detector ID in the current Region.
  8. If you have configured object tagging, GuardDuty adds a predefined tag with key GuardDutyMalwareScanStatus and a potential scan result value of your scanned S3 object.

IAM permissions

Enabling and using GuardDuty Malware Protection for S3 requires you to add AWS Identity and Access Manager (IAM) role permissions and a specific trust policy for GuardDuty to perform the malware scan on your behalf. GuardDuty provides you flexibility to enable this feature for your entire bucket, or limit the scope of the malware scan to specific object prefixes where GuardDuty scans each uploaded object that starts with up to five selected prefixes.

To allow GuardDuty Malware Protection for S3 to scan and add tags to your S3 objects, you need an IAM role that includes permissions to perform the following tasks:

  1. A trust policy to allow Malware Protection to assume the IAM role.
  2. Allow EventBridge actions to create and manage the EventBridge managed rule to allow Malware Protection for S3 to listen to your S3 object notifications.
  3. Allow Amazon S3 and EventBridge actions to send notification to EventBridge for events in the S3 bucket.
  4. Allow Amazon S3 actions to access the uploaded S3 object and add a predefined tag GuardDutyMalwareScanStatus to the scanned S3 object.
  5. If you’re encrypting S3 buckets with AWS Key Management System (AWS KMS) keys, you must allow AWS KMS key actions to access the object before scanning and putting a test object in S3 buckets with the supported encryption.

This IAM policy is required each time you enable Malware Protection for S3 for a new bucket in your account. Alternatively, update an existing IAM PassRole policy to include the details of another S3 bucket resource each time you enable Malware Protection. See the AWS documentation for example policies and permissions required.

S3 object tagging and access control

When you enable S3 object tagging, GuardDuty adds a predefined tag with key GuardDutyMalwareScanStatus and a potential scan result value of your scanned S3 object. These tags enable the implementation of a tag-based access control (TBAC) policy for the objects, halting access to an S3 object until a malware scan has been completed.

The example S3 bucket policy in the AWS GuardDuty user guide stops anyone other than the GuardDuty Malware scan service principal from reading objects from the specific S3 bucket that aren’t tagged GuardDutyMalwareScanStatus with a value NO_THREATS_FOUND. The policy also helps prevent other roles or users other than GuardDuty from adding the GuardDutyMalwareScanStatus tag.

Configure optional access for other IAM roles that are allowed to override the GuardDutyMalwareScanStatus tag after an object is tagged. Achieve this by replacing <IAM-role-name> in the following example S3 bucket policy.

{
            "Sid": "OnlyGuardDutyCanTag",
            "Effect": "Deny",
            "NotPrincipal": {
                "AWS": [
                    "arn:aws:iam::555555555555:root",
                    "arn:aws:iam::555555555555:role/<IAM-role-name>",
                    "arn:aws:iam::555555555555:assumed-role/<IAM-role-name>/GuardDutyMalwareProtection"
                ]
            },

Change the policy if you are required to allow certain principals or roles to read failed or skipped objects. You can permit a special role to read the malicious object if needed as part of your existing incident response process. Do this by adding an additional statement into the S3 bucket policy and replacing the <IAM-role-name>value in the following example.

{
            "Sid": "AllowSecurityTeamReadMalicious",
            "Effect": "Deny",
            "NotPrincipal": {
                "AWS": [
                    "arn:aws:iam::555555555555:role/<IAM-role-name>"
                ]
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::DOC-EXAMPLE-BUCKET",
                "arn:aws:s3:::DOC-EXAMPLE-BUCKET/*"
            ],
            "Condition": {
                "StringNotEquals": {
                    "s3:ExistingObjectTag/GuardDutyMalwareScanStatus": "THREATS_FOUND"
                }
            }
        },

Solution overview

This solution is designed to streamline the deployment of GuardDuty Malware Protection for S3, helping you to maintain a secure and reliable S3 storage environment while minimizing the risk of malware infections and their potential consequences. The solution provides several configuration options, allowing you to create a new S3 bucket or use an existing one, enable encryption with a new or existing AWS KMS key, and optionally set up a function to copy objects with a defined tag to a destination S3 bucket. The copy function feature offers an additional layer of protection by separating potentially malicious files from clean ones, allowing you to maintain a separate repository of safe data for continued business operations or further analysis.

Figure 2 shows the solution architecture.

Figure 2: Amazon GuardDuty copy S3 object solution overview

Figure 2: Amazon GuardDuty copy S3 object solution overview

The high-level workflow of the solution is as follows:

  1. An object is uploaded to an S3 bucket that has been configured for malware detection.
  2. The malware scan service receives a notification that a new object has been detected in the bucket and then GuardDuty reads, decrypts, and scans the object in an isolated environment.
  3. An EventBridge rule is configured to listen for events that match the pattern of completed scans for the monitored bucket that have a scan result of NO_THREATS_FOUND.
  4. When the matched event pattern occurs, the copy object Lambda function is invoked.
  5. The Lambda copy object function copies the object from the monitored S3 bucket to the target bucket.

In this solution, you will use the follow AWS services and features:

  • Event tracking: This solution uses an EventBridge rule to listen for completed malware scan result events for a specific S3 bucket, which has been enabled for malware scanning. When the EventBridge rule finds a matched event, the rule passes the required parameters and invokes the Lambda function required to copy the S3 object from the source malware protected bucket to a destination clean bucket. The event pattern used in this solution uses the following format:
    {
      "source": ["aws.guardduty"],
      "detail-type": ["GuardDuty Malware Protection Object Scan Result"],
      "detail": {
        "scanStatus": ["COMPLETED"],
        "resourceType": ["S3_OBJECT"],
        "s3ObjectDetails": {
          "bucketName": ["<DOC-EXAMPLE-BUCKET-111122223333>"]
        },
        "scanResultDetails": {
          "scanResultStatus": ["NO_THREATS_FOUND"]
        }
      }
    }

    Note: Replace the value of the bucketName attribute with the bucket in your account.

  • Task orchestration: A Lambda function handles the logic for copying the S3 object from the source bucket to the destination bucket which has just been scanned by GuardDuty. If the object was created within a new S3 prefix, the prefix and the object will be copied. If the object was tagged by GuardDuty, then the object tag will be copied.

Deploy the solution

The solution CloudFormation template provides you with multiple deployment scenarios so you can choose which best applies to your use case.

Deploy the CloudFormation template

For this next step, make sure that you deploy the CloudFormation template provided in the AWS account and Region where you want to test this solution.

To deploy the CloudFormation template

  1. Choose the Launch Stack button to launch a CloudFormation stack in your account. Note that the stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution in other Regions, download the solution’s CloudFormation template, modify it, and deploy it to the selected Regions.

    Launch Stack

  2. Choose the appropriate scenario and complete the parameters information questions as shown in Figure 3.

    Figure 3: CloudFormation template parameters

    Figure 3: CloudFormation template parameters

    Each of the following scenarios and their parameter information (from Figure 3) can be evaluated to make sure that the CloudFormation template deploys successfully:

    Deployment scenario

    • Create a new bucket or use an existing bucket?
    • If ”new”, should a KMS key be created for the new bucket?
    • Would you like to create the copy function to a destination bucket? Create the Lambda copy function from the protected bucket to the clean bucket.

    Post scan file copy function

    • This will be used as the basis for the copy function and EventBridge rule to invoke the function: Copy files to the clean bucket with either the THREATS or NO_THREATS_FOUND tagged value.

    Existing S3 bucket configuration – not used for new S3 buckets

    • Enter the bucket name that you would like to be your scanned bucket: Enter the existing S3 bucket name that will be enabled for GuardDuty Malware Protection for S3.
    • Enter the bucket name that you would like to be your scanned bucket: Enter the S3 bucket name to be used as the copy destination for S3 objects.
    • Is the existing bucket using a KMS key? Is the existing S3 bucket encrypted with an existing KMS key?
    • ARN of the existing KMS key to be used: Provide the existing KMS key Amazon Resource Name (ARN) to be used for KMS encryption. IAM policies will be configured for this KMS key name.
    • Lambda Copy Function clean bucket: Create a new S3 bucket with the Lambda copy function from the protected bucket to the clean bucket.
  3. Review the stack name and the parameters for the template.
  4. On the Quick create stack screen, scroll to the bottom and select I acknowledge that AWS CloudFormation will create IAM resources.
  5. Choose Create stack. The deployment of the CloudFormation stack will take 3–4 minutes.

After the CloudFormation stack has deployed successfully, the solution will be available for use in the same Region where you deployed the CloudFormation stack. The solution deploys a specific Lambda function and EventBridge rule to match the name of the source S3 bucket.

Deploy the AWS CDK template

Alternatively if you prefer to use AWS CDK, download the CDK code from the GitHub repository.

Follow the readme contained within the repository to deploy the solution or individual components depending your requirements.

Extend the solution

In this section, you’ll find options for extending the solution.

Copy alternative status results

The solution can be extended to copy S3 objects with a scan result status that you define. To change the scan result used to invoke the copy function, update the scanresultstatus in the event pattern defined in EventBridge rule created as part of the solution named S3Malware-CopyS3Object-<DOC-EXAMPLE-BUCKET-111122223333>.

"scanResultDetails": {
      "scanResultStatus": ["<scan result status>"]

Delete source S3 objects

To delete the object from the source after the copy was successful, you will need to update the Lambda function code and the IAM role used by the Lambda function.

The IAM role used by the Lambda function requires a new statement added to the existing role. The JSON formatted statement is provided in the following example.

{
			"Action": [
				"s3:DeleteObject"
					],
			"Resource": [
				"arn:aws:s3:::<DOC-EXAMPLE-BUCKET-111122223333>/*"
			],
			"Effect": "Allow",
			"Sid": "AllowDeleteObjectSourceBucket"
		},

The copy Lambda function requires the following lines to be added at the end of the function code to delete the object:

s3.delete_object(Bucket=SOURCE_BUCKET,Key=SOURCE_KEY)

Scan existing S3 objects

When GuardDuty Malware Protection for S3 is enabled, it scans only new objects put into the bucket. To scan existing objects in a S3 bucket for malware, set up bucket replication to replicate all objects from a source bucket to a destination bucket with Malware Protection enabled.

Automate tagged object deletion

To remove malicious objects from the S3 bucket to help prevent accidental download or access, implement a tag-based lifecycle rule to delete the object after a specific number of days. To achieve this follow the steps in Setting a lifecycle configuration on a bucket to configure a lifecycle rule and make sure the tag key is GuardDutyMalwareScanStatus and value is THREATS_FOUND.

Figure 4: Tag based S3 lifecycle rule

Figure 4: Tag based S3 lifecycle rule

Align the lifecycle policy with your organization’s current S3 object malware investigation procedures. Deleting objects prematurely might hinder security teams’ ability to analyze potentially malicious content. When using bucket versioning instead of permanently deleting the object, Amazon S3 inserts a delete marker that becomes the current version of the object.

AWS Transfer Family integration

If you’re using the AWS Transfer Family service with Secure File Transfer Protocol (SFTP) connector for S3, it’s recommended to scan external uploads for malware before using the received files. This helps ensure the security and integrity of data transferred into your S3 buckets using SFTP.

Figure 5: AWS Transfer Family S3 workflow

Figure 5: AWS Transfer Family S3 workflow

To implement malware scanning, configure a file processing workflow configuration to copy the uploaded objects into an S3 bucket that has GuardDuty Malware Protection for S3 enabled.

Figure 6: Transfer Family configuration workflow

Figure 6: Transfer Family configuration workflow

Summary

Amazon GuardDuty Malware Protection for S3 is now available to assess untrusted objects for malicious files before being ingested by downstream processes within your organization. Customers can automatically scan their S3 objects for malware and take appropriate actions, such as quarantining or remediating infected files. This proactive approach helps mitigate the risks associated with malware infections, data breaches, and potential financial losses. The solution provided offers an additional layer of protection by separating potentially malicious files from clean ones, allowing customers to maintain a separate repository of safe data for continued business operations or further analysis. Visit the 2024 re:Inforce session or the what’s new blog post to understand additional service details.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Luke Notley

Luke Notley
Luke is a Senior Solutions Architect with Amazon Web Services and is based in Western Australia. Luke has a passion for helping customers connect business outcomes with technology and assisting customers throughout their cloud journey, helping them design scalable, flexible, and resilient architectures. In his spare time, he enjoys traveling, coaching basketball teams, and DJing.

Arran Peterson

Arran Peterson
Arran, a Solutions Architect based in Adelaide, South Australia, collaborates closely with customers to deeply understand their distinct business needs and goals. His role extends to assisting customers in recognizing both the opportunities and risks linked to their decisions related to cloud solutions.

Using Amazon GuardDuty Malware Protection to scan uploads to Amazon S3
Author: Luke Notley