Launch phase steps for successful launches on Amazon GameLift Servers

TutoSartup excerpt from this article:
We’ll review five key areas of final planning for game launch: Fill out the Launch Questionnaire and get your limits increased Set up your production fleets Do load testing and test the critical path Monitor API throttling Use Blue/Green deployments in production with new game server…

When a game goes viral being setup for success from the start is vital. We are going to cover important areas of consideration when launching a multiplayer game on Amazon GameLift Servers. We will focus on activities required 2-3 months prior to your game launch. This can be your full public launch of the game, but it can also mean open beta, early access, or other milestones where you have real players.

We’ll review five key areas of final planning for game launch:

Fill out the Launch Questionnaire and get your limits increased
Set up your production fleets
Do load testing and test the critical path
Monitor API throttling
Use Blue/Green deployments in production with new game server versions

1 – Fill out the Launch Questionnaire and get your limits increased

One of the key things to enable milestones (such as open beta, early access, and eventually your full game launch) is to make sure you have your service limits increased to match your needs. The default service limits for Amazon GameLift Servers are there to protect the account from accidentally scaling out in early development. When you’re ready to begin actively supporting players, you may need higher limits to supply the needed infrastructure required to support your player load.

The Launch Questionnaire can be found in the Amazon GameLift Servers console, in the left-hand side menu under Resources, select Prepare to launch. It covers both the instance limits for your selected instance types, as well as throttling limits for the Amazon GameLift Servers APIs.

Some key considerations when filling out the questionnaire:

Do this early, 2-3 months before a launch milestone.
Remember a beta or private preview is still a launch, so you will need the limits increased for that too. You can always send a new version for the next milestone.
Multi-location fleets have a home Region and additional locations. Remember to define the home Region of your fleet and request limits for each location against that selected home Region. The location specific instance limits will be set on each home Region level separately.
Make sure to review the exact Amazon GameLift APIs you are using, and request limit increases to match your expected peak request rate. Avoid utilizing the Describe APIs in your game session creation flow, as these control plane APIs are generally not designed to be called for every player request. If you need to call these APIs, you can do it in a central fashion, as discussed in Host persistent world games on Amazon GameLift Servers.
Make sure to request limits for all required Amazon Web Services (AWS) accounts. This can include production, testing and load testing accounts.

When sending out the questionnaire, make sure to add your AWS account team to the email, if you have one assigned, to keep everyone on the same page.

Figure 1 shows where to find the Launch Questionnaire in the Amazon GameLift Servers console.

Amazon GameLift Servers management console browser view with "Prepare to launch" under "Resources" selected and highlighted. The "Download questionnaire" button and the email address "amazon-servers-game-launches@amazon.com" are highlighted in the Prepare to launch view.

Figure 1: Amazon GameLift Servers Launch Questionnaire.

2 – Set up your production fleets

We recommend production fleets on Amazon GameLift Servers to be configured differently from development fleets.

Key considerations:

Set the Game Scaling Protection Policy to Full Protection. This makes sure that when the fleet is scaling down, running game sessions are protected.
Enable Target-based Auto-scaling policy, and make sure you have a healthy buffer for launch day (up to 30-50%). You can always reduce this later when traffic stabilizes.
Use multi-location fleets instead of individual fleets for each Region. This will streamline your operations significantly, because it provides a single view to your global fleet resources and reduces operational complexity.
Consider your latency targets and player population and select locations accordingly. A low latency, first-person shooter will commonly require multiple locations for each continent. A slower paced game can benefit from fewer locations for streamlined operations. You can use the Amazon GameLift provided UDP ping beacons to measure latency on the client side.
Set your scaling configuration for each location (min and max) and make sure to have a healthy baseline (min) in each location, and room for sudden peaks in demand (max).

For the launch day, we recommend to scale out in advance by setting the min value of each of your locations to a high enough baseline to cover the initial traffic peak. You can always set this value to a lower number once the traffic patterns stabilize, but it’s good to be ready for a high initial launch peak.

Verify readiness for unexpected player load with the ability to host game servers using more than one instance type. This could be, for example, .large and .xlarge variations of your selected instance family, or different instance families or generations within the same instance family. While most games will never need to host multiple fleets, at massive scale having a multi-fleet strategy as an option can make sure you have the capacity you need.

Figure 2 shows how two multi-location fleets are registered to the same Amazon GameLift Servers Queue. One fleet is using C6i.large instance type and is scaled out to handle the game launch. The second fleet is using C5.large instance type, and is not scaled out. The limits for both instance types have been increased with the Launch Questionnaire to handle production traffic. In the rare case that C6i.large availability in one of the locations would be low, having the backup fleet would allow scaling out with a different instance type and keep serving players. The backup fleet could also be another instance size in the C6i family such as C6i.xlarge.

An architecture with two Amazon GameLift global fleets, “Main global fleet” and “Backup global fleet” both with two locations, US-East-1 and EU-West-1. “Main global fleet” is using the c6i.large instance type in both locations, and the “Backup global fleet” using c5.large. An Amazon GameLift Servers Queue is attached to the “Main Global fleet”, and a dotted line to the backup fleet. A box for “Game Backend” is attached to the queue.

Figure 2: Utilizing two multi-location fleets to prepare for large scale.

3 – Do load testing and test the critical path

Load testing is important to reveal any bottlenecks in your infrastructure. It’s one of the most important steps of getting ready for launch.

For Amazon GameLift Server specifically, load testing can surface:

Insufficient instance limits
Insufficient API limits (applied for each API separately)
Issues with dependencies, such as your backend systems that the game servers communicate with

Implementing load testing with realistic traffic pattern at scale helps surface these issues. This means ramping up session placement requests across all locations, as they would come in with high concurrent user counts, and make sure all systems behave as expected.

Testing scale-out from zero concurrent users to 500 K in five minutes sound like a good test on paper, however, it may not represent a realistic traffic pattern. Testing with a realistic pattern helps you not go overboard with expectations. The ramp up to your peak usually happens over a longer period of time (typically over hours) and you can use data from your previous games, or tools such as SteamDB to see common traffic patterns for launches.

There are two key ways to do load testing:

Testing the scaling of the fleet and session placement can be done by directly invoking APIs such as StartGameSessionPlacement. It can be done with Python or Bash scripts with a relatively low number of actual clients. This is a great smoke test for your API and instance limits, as well as your scaling configuration.
Test the complete critical path (including your backend), as well as game servers. This approach includes actual account creation and login to your game, as well as using your backend to request matchmaking or session placement. It’s a more holistic approach to load testing that also tests for any bottlenecks in your backend. We recommended to do this testing with either headless bot clients of your game (running on AWS Fargate containers for example), or scripts that behave as closely to a client as possible.

Optimally, when doing the complete testing, you also have the clients connecting to your game servers, and playing through the game by sending movement and other actions as a normal player would. This will stress test the performance of the game servers as well. It also helps test a realistic session length and game session rotation when you have the clients play through a session and log out to connect to the next one.

For successful monitoring of these tests, use the approaches covered in the monitoring, logging, and alarms section in the first part of this blog series, Development phase steps for successful launches on Amazon GameLift Servers. In addition, you should track any errors and throttling you receive from both Amazon GameLift Servers APIs, as well as any other services and components managed by you or any third-parties. We’ll cover this in detail in section: 4 – Monitor API Throttling.

As Werner Vogels (CTO, Amazon) has said, “Failures are a given and everything will eventually fail over time”. Making sure your critical path can gracefully handle the failure of any dependency (Steam, console logins, databases, and so on). You should also be certain you can recover from any internal failure. This will help you prepare for any surprises on the launch day.

Figure 3 shows an example of hosting load testing clients on AWS Fargate as a service that runs multiple Amazon Elastic Container Service (Amazon ECS) tasks, each of which can host up to 10 individual client containers. The load test clients can be run across multiple AWS Regions to test how latency and player location affects the experience. The clients can be either a scripted, headless version of your actual game client, or scripts that behave as game clients.

Architecture with two Amazon ECS Fargate Services, one in US-East-1 and one in EU-West-1. Both with 4 Tasks shown that are attached to a “Game Backend” box to request a session. Game Backend is attached to an Amazon GameLift Servers Queue, which is attached to an Amazon GameLift Servers global fleet. The Tasks on Amazon ECS also connect directly to the fleet to simulate gameplay.

Figure 3: Critical path load testing.

4 – Monitor API Throttling

During load tests, Amazon GameLift Servers API calls may exceed the default provisioned limits—resulting in throttling errors. Identifying and responding to throttled calls is critical for operational stability, availability and confirming a seamless player experience.

AWS CloudTrail provides comprehensive API event tracking capabilities for Amazon GameLift Servers, so you can monitor and audit all API usage.

You can effectively use CloudTrail to:

Track Amazon GameLift Servers API activities
Identify throttling
Configure alarms on custom CloudWatch metric
Notify operations teams to request limit increases that matches your expected peak request rate

Monitor throttling where the eventSource is gamelift.amazonaws.com, and one of the following errorCode or errorMessage will be applied to the CloudTrail record:

errorCode is ThrottlingException
errorCode is RequestLimitExceeded
errorMessage is RateExceeded

For operational visibility, create a new trail in CloudTrail and enable CloudWatch logs. Make certain you have the correct AWS Identity and Access Management (IAM) permissions to write logs into CloudWatch logs. Specify a CloudWatch log group to capture Amazon GameLift events. After configuration completion, you will see all Amazon GameLift API calls and associated throttling errors in the CloudWatch logs.

In CloudWatch logs, select the log group used by CloudTrail to write logs into, and create a custom metric filter to identify throttling patterns. You should assign a namespace and successfully create the custom metric that increments whenever throttling occurs.

Following is an example filter pattern:

{ ($.eventSource = “gamelift.amazonaws.com”) && ($.errorCode = “ThrottlingException”) }

Now, with a custom metric in place, configure CloudWatch alarms to monitor throttling thresholds. For example, trigger an alarm if more than 10 throttled requests occur within a five-minute period. Attach the alarm created to an Amazon Simple Notification Service (Amazon SNS) topic that sends email, short message service (SMS), or chat notifications to your operations team. They can then review API usage and take actions accordingly.

To minimize throttling before requesting limit increases:

Implement exponential backoff with Amazon GameLift Servers API calls
Use pagination and filtering when retrieving large datasets from Amazon GameLift Servers APIs
Cache frequently accessed data, such as fleet information, to avoid repeated API calls
Batch operations where possible to reduce API call frequency

5 – Use Blue/Green deployments in production with new game server versions

Once you’re successfully in production, you’ll need to patch your game servers relatively frequently in the beginning, and in an ongoing fashion as you add features and improvements to the game. The recommended way to do the updates on Amazon GameLift Servers is a Blue/Green deployment.

In this approach you set up a completely new production fleet with the new game server build or container image. Once you have the new fleet ready and monitoring looks good, you can have an additional step to smoke test it with a few game sessions. After this, do the inflight update by switching your Amazon GameLift Servers Queue to route session placements to the new fleet instead of the old one. New sessions will start being created on your new version, but running sessions on the old fleet can run without interruptions.

If everything looks good you can terminate the old fleet once it’s drained out of sessions. In cases where you need to roll back to the previous version, you can switch back to the old version by switching the queue target. This is one of the key benefits of a Blue/Green deployment.

In case where you’re not using queues for session placement, an alias resource can be used in a similar fashion. There is a python script available in the Amazon GameLift Servers Toolkit that shows how to implement this approach.

Figure 4 shows a Blue/Green deployment where the fleet behind a queue is replaced with a new one, and old sessions are drained on the previous fleet.

Architecture with two Amazon GameLift Servers fleet, one called “Blue: Previous fleet” and the other one “Green: New fleet”. Both fleets have two locations US-East-1 and EU-West-1. An Amazon GameLift Servers Queue is attached to the green fleet with a full line, and to the blue fleet with a dotted line. Text says “Switch previous fleet to the new fleet on the queue. New sessions will be placed on the new fleet, and old sessions can end on the previous fleet. A “Game Backend” box is attached to the queue.

Figure 4: Blue/Green deployment.

Conclusion

We covered how to make sure your service limits are scaled out for a successful production launch on Amazon GameLift Servers. We then discussed key considerations for setting up your production fleets. We also covered how load testing helps you get prepared for launch, and what are the common ways of conducting load testing. Finally, we discussed how Blue/Green deployments help you manage in-flight server version updates in production.

This series covered a breadth of aspects for getting operationally and architecturally ready for a successful game launch on Amazon GameLift Servers. Get started today with Amazon GameLift Servers for multiplayer game server hosting. Contact an AWS Representative to learn how we can help accelerate your business.