Deploy Unreal Engines Pixel Streaming at scale on AWS

TutoSartup excerpt from this article:
Unreal Engine’s Pixel Streaming allows the running of Unreal Engine applications on a server in the cloud and streaming its rendered frames and audio to browsers and mobile devices using WebRTC… A signaling and web server along with a CoTURN implementation (for the TURN server) to serve traff…

Unreal Engine, created by by Epic Games, is one of the most advanced tools for creating and rendering photo-realistic visuals and immersive experiences. This is needed to power the latest games and the Metaverse. Traditionally, such experiences required a thick client, a desktop computer for example, with a discrete GPU. Unreal Engine’s Pixel Streaming allows the running of Unreal Engine applications on a server in the cloud and streaming its rendered frames and audio to browsers and mobile devices using WebRTC.

Using AWS and Unreal Engine’s Pixel Streaming, developers can create content with Unreal Engine and deploy it on Amazon Elastic Compute (EC2) on either Windows or Linux. This article will focus on how to deploy your Unreal Engine application at scale. This means being able to spin up and spin down EC2 instances based on user streaming requests. The article will also discuss managing user streaming sessions across these EC2 instances.

Pixel Streaming component overview

From a hosting and deployment point of view, there are four key components to the Pixel Streaming solution

  1. A Unreal Engine application package to run the game logic.
  2. A signaling and web server along with a CoTURN implementation (for the TURN server) to serve traffic to the client via WebRTC protocol.
  3. A matchmaker server to distribute load across multiple gaming instances and to handle connectivity between the client and gaming application.
  4. A frontend component to run the package session on the client (browser).

The signaling server, CoTURN server, and Unreal Engine application can be hosted on the same Amazon EC2 instance. The instance requires GPU capabilities, offered by instance families such as G4dn. The instance also requires a public IP address for the CoTURN to maintain a WebRTC session with the client and can be deployed on a public subnet. In the remainder of the article, we will refer this Amazon EC2 instance as the signaling server.

The matchmaker server can be hosted on a separate general purpose Amazon EC2 instance, deployed in a private subnet. In the remainder of the article, we will refer this Amazon EC2 instance as the matchmaker server

The frontend server can be hosted on a separate general purpose Amazon EC2 instance, deployed in a private subnet. We will refer this Amazon EC2 instance as the frontend server.

Once these components are deployed, a request for a streaming session from a user’s device (for example, a web browser) would start the following sequence of steps:

  1. The frontend server receives the user request and renders a webpage on the user’s browser.
  2. The user requests a session by selecting a button on the webpage, which makes a call to matchmaker server.
  3. The matchmaker server receives the user requests and checks if there are any signaling servers available to serve the request.
  4. If a signaling server is available, the matchmaker server sends the signaling server details to the webpage running on the user’s browser.
  5. The webpage establishes a WebSocket connection to the signaling server.
  6. The signaling server makes a connection to the Unreal Engine application package on the streamer port and passes it the user information.
  7. The Unreal Engine application package uses CoTURN to forward all stream traffic to the user’s webpage.

Note : The default implementation of the frontend server directly interfaces with the signaling server (step 5). This needs to change to incorporate the matchmaker between the two.

While the above sequence of steps explains how streaming session works for a single user request, how can we handle multiple concurrent user session requests? Also, what do we do with the signaling servers once a session is completed? Finally, what if a signaling server is not immediately available for servicing a user streaming request, how we handle the user interaction in those scenarios? More importantly how we do manage cost and security while setting up this streaming framework?

Solution overview

In this section, we will answer some of these questions by defining a solution architecture which can cater to these requirements. We will start off by building a custom horizontal scaling framework which will allow creation of signaling servers based on user session requests. The scaling framework is comprised of two key components

  • A trigger to determine when a signaling server needs to be created
  • A process for creating the signaling server

The trigger would need to identify an incoming user session request and then check with the matchmaker server for available signaling servers to service the request. If the matchmaker server is unable to return an available signaling server, the trigger logic then starts creating a new signaling server. This orchestration is done by an AWS Lambda function which executes this logic on a user session request.

The process for creating a signaling server is implemented in an AWS Lambda function that spins up a new Amazon EC2 instance and deploys the components of the signaling server. Additionally, signaling servers can be registered with an Application Load Balancer’s (ALB) target group to allow us to access the signaling server using a load balancer url. An Amazon EventBridge rule can be created which is triggered when  signaling servers changing state to ‘Running’. This then calls an AWS Lambda function to register the signaling servers to a target group. The diagram below depicts the same flow.

Creating a signaling server

Once a signaling server is running, it lets the matchmaker server know about its state. The matchmaker server can then leverage this signaling server to service new user session requests. As discussed in the Pixel Streaming overview, the matchmaker server has logic to find an available signaling server instance to service a user session request. Additionally, it ensures that an in-use signaling server is not chosen to service a new user session request.

As the signaling servers are created based on user session request, we need to keep track of the request until a signaling server is available to service the request. An Amazon Simple Queue Service(SQS) FIFO would be good fit for this requirement. It would allow us to store the user requests till they are ready to be serviced and also enforce some kind of ordering, like first in first out (FIFO), to determine which requests gets serviced first. The trigger can poll the queue to monitor incoming user session requests.

Additionally, once a signaling server is available to service the request, we need to relay the information to the user. The user can then leverage the server for starting their session. A WebSocket connection fits this requirement. It allows us to relay the signaling server information to the user as opposed to requiring the user to poll at regular intervals, inquiring about availability of a signaling server. Amazon API Gateway supports WebSocket connections and we can use an AWS Lambda function to send the signaling server details to the user, via the open WebSocket connection.  The following diagram depicts the usage of a queue and web socket connection to manage the user requests

Orchestrate user session request

In the previous sections we discussed how a combination of an AWS Lambda function and EventBridge can be used to register the signaling servers to an ALB target group. However, we would need to ensure that the ALB url directs traffic to the designated signaling server which has been identified by the matchmaker server for the user session. Another way of achieving this is by using a rule in the ALB listener associated with the target group to allow accessing a specific signaling server by a unique query string. This query string is defined at the time of registering a new created signaling server to the ALB target group. An AWS Lambda function then relays this query string to the user(via the WebSocket connection) to ensure the user is directed to the designated signaling server for their session.

Finally, we would need a mechanism to scale in the signaling servers once the user sessions are completed. While there are multiple ways to achieve this, one option would be to stop the instances after a specific period of time, using a shell/batch script. The period of time can be governed by how long a user session request typically lasts.  For example, in case of Linux instances, adding sudo shutdown –halt +20 to user data would automatically stop the instances after 20 mins. An Amazon EventBridge rule can be written to be triggered when the instances change state to stopped and call an AWS Lambda function to terminate them.

Security Controls

To support encryption in transit, an HTTPS listener can be configured in the matchmaker server, frontend server and signaling server ALBs. This would ensure that all communication to customer devices is over SSL.

For authentication :

  • The application hosted on the frontend server can be integrated with an identity provider (like Amazon Cognito) to authenticate the user and send a bearer token to Amazon API Gateway
  • The API Gateway can have an AWS Lambda function (authorizeClient) to validate the token on a WebSocket connection ( Item #2,3 and 4 in solution diagram
  • The matchmaker server can be configured to authenticate API calls using a client id and secret

Additional scope of optimization

  1. The matchmaker server is missing a persistence layer to track all available signaling servers. You could consider using something like Amazon ElastiCache for Redis to manage this. See the article Getting started with Amazon ElastiCache for Redis for details.
  2. The signaling server can be modified to require a token for authentication. The token can be passed as the first message of the WebSocket connection from the frontend. This would be accomplished by looking into authentication patterns for WebSocket connection.

Cost optimization considerations

The solution considers On-Demand as the purchasing option for signaling server instances. In order to optimize cost, the scaling framework can be configured to terminate instances on completion of streaming sessions. It can also establish a ceiling for the maximum number of running instances at any given point in time. A sample implementation of the same is included in the shared repo.

If the consumption is fairly consistent throughout the year, it is worth exploring Reserved Instances or Savings plans as levers for reducing cost on the signaling servers.

For the matchmaker and frontend servers, it is possible to containerize them and host them on AWS Fargate, allowing scaling based on demand. However, this requires adding a persistence layer to the matchmaker server to track available signaling servers.

Conclusion

In this article we demonstrated how to deploy Unreal Engine Pixel Streaming at scale on Amazon EC2 instances and leverage AWS services to build up a solution that can serve sessions to multiple users on demand. A reference implementation of the solution proposed in this article is hosted in this repository. You can use this as a framework to further build and optimize your solution and get started with Pixel Streaming on AWS quickly!

Deploy Unreal Engines Pixel Streaming at scale on AWS
Author: Jishnu Dasgupta