Developer’s Guide to operate game servers on Kubernetes – Part 1
They are tasked with the provisioning and monitoring of extensive server fleets… Figure 1 – A view of a traditional game servers deployment… When a game goes live, players are directed to regional server fleets, and game operation teams must dynamically scale server capacity based on p…
Introduction
Live operations are a strategy that maintains player interest through continuous updates and fresh content, enhancing dynamic engagement and driving game evolution across platforms. Game operation teams use live opserations to deliver new expansions or events in multiplayer online games, enriching the online world.
Player customization, seasonal events, and community challenges boost retention, engagement, and collaborative gameplay. Players need an evolving game that responds to their interactions and actions, influencing future developments. Dynamic in-game economy adjustments and regular content updates keep the game balanced, fresh, and extend player session times.
Through live operations, game operations teams ensure titles adapt to changing player preferences, enhancing longevity, and fostering a dedicated player base, which is essential for success in the gaming industry.
Traditional game infrastructure and its hurdles
The traditional virtual machine infrastructure poses unique challenges when game operations teams establish live operations. They are tasked with the provisioning and monitoring of extensive server fleets. A typical deployment involves the creation of an Amazon Virtual Private Cloud (Amazon VPC), and proper network configuration for the pools of virtual machines used to host game sessions.
When a game goes live, players are directed to regional server fleets, and game operation teams must dynamically scale server capacity based on player counts, optimizing resource usage. This scaling process is crucial for game launches and ongoing live operations activities. However, scaling virtual machines (VMs) to handle increased player demand can be a daunting task, involving extensive capacity planning, provisioning, and maintenance, straining resources, especially for smaller studios.
Containers to the rescue
Containers emerge as heroes in game development, offering developers unparalleled agility and scalability for managing complex game demands.
Containerization enables rapid deployment of updates, seamless introduction of new in-game features, and swift rollout of new quests and environments, essential for maintaining continuous player engagement and keeping virtual worlds fresh. Containers provide isolation for microservices architectures, improving game performance and development workflow. Developers can scale services using containers to meet player demand, maintain stability during peak loads, and ensure a consistent gaming experience, making containers a cornerstone of modern game operations.
Benefits such as resource efficiency, cost savings, improved scalability, and reduced latency with edge-base architectures are explored in the blog post Optimize game servers hosting with containers.
Diverse container offerings on AWS
Amazon Web Services (AWS) provides essential container services for live operations in game development. Developers can leverage Amazon Elastic Container Service (Amazon ECS), a fully managed container orchestration service, or Amazon Elastic Kubernetes Service (Amazon EKS) to tap into the Kubernetes ecosystem. AWS Fargate brings serverless container compute to Amazon ECS and Amazon EKS, freeing game operation teams from server management and allowing them to focus solely on the creation of compelling content.
With diverse container services, game operation teams can deploy updates, manage in-game events, and introduce new features, ensuring games remain dynamic and responsive to player interactions. Within the AWS ecosystem, containers are pivotal enablers of live operations, facilitating real-time game evolution and growth alongside their player communities.
This article focus on the design of Kubernetes clusters to deploy game servers for games. The article outlines how to leverage infrastructure-as-code with Terraform to build clusters based on Amazon EKS best practices.
AWS networking considerations for containers runtime
The starting point for game operation teams before running game servers on Kubernetes is to plan the location, sizing and how traffic will flow to the game servers.
Since the game servers are hosted in multiple regions, each Kubernetes cluster runs in a VPC created in the region where players are based. The VPC defines public and private subnets with the IP addresses range assigned.
The terraform script for the Kubernetes cluster declares a module called “vpc” which defines the availability zones, private subnets, and public subnets. The variable “vpc_cidr” is used to split the IP addresses range for the deployment between the private and public subnets.
A good practice when planning your containers runtime deployment is to aim for a minimum of 2 availability zones and ideally 3 availability zones (2n+1) to improve reliability. The list of availability zones used by the cluster is defined with the variable “azs”.
In addition, it is possible to define traffic rules directly in the terraform code to restrict the origin, protocol, and ports for the game servers. The following code sample defines traffic rules for UDP, TCP and custom game server webhooks:
It is a best practice to evaluate the number of IP addresses required for the game servers and plan how IP addresses are managed during game operations.
The code sample assigns public IP addresses to game servers for demonstration purposes. The best practice is to use AWS Global Accelerator and create custom routes to send traffic to private IP addresses assigned to game servers.
IP prefixes can be used to prevent IPv4 addresses exhaustion on large clusters hosting game servers. Amazon EKS supports IPV6 which is another option to ensure a large amount of IP addresses is available for the game servers. Depending on the game requirements you can customize the terraform definition for the Amazon EKS cluster to build dual stack Kubernetes clusters offering support for both IPv6 and IPv4. You can also opt for a setup where IPv6 addresses are assigned to the pods running the game servers and the game clients connect to the game servers using an IPv4 address.
Since the resulting infrastructure will run a large number of game servers, Amazon VPC IP Address Manager (IPAM) can be used to maintain the inventory of IP addresses.
With IPAM, game operations teams can prevent IP ranges from overlapping and impact the performance of game servers. The full list of best practices for Amazon EKS networking are listed in this guide maintained by the Amazon EKS team.
Cluster-to-cluster communication.
Since Kubernetes clusters running game servers can be in different regions, it is important to look at cluster-to-cluster communication and factor in data transfer costs in the architecture. As a rule, we do not recommend exposing services on each cluster via appliances or services over Internet. There are many secure options available to you like VPC Peering, AWS Transit Gateway or, AWS PrivateLink that will keep cluster traffic secure.
Transit Gateway is a great option that balances the total cost of ownership with scale and performance. The service offers multi-region connectivity over the AWS backbone and the ability to connect Amazon VPCs across accounts.
Game operations teams running clusters in the same region, can also use Amazon VPC Lattice to setup communications between services. VPC Lattice is an application networking service that connects, monitors, and secures communications between services. With that approach game operations teams can handle many-to-many VPC connectivity over the AWS backbone, with a viable option to control data transfer costs. In addition, the centralized management of interconnectivity and the ability to use AWS Identity and Access Management (IAM) policies to control access VPC Lattice reinforce security during game operations.
Blueprint for Kubernetes
The blog post outlines the fundamental networking considerations when deploying game servers on Kubernetes clusters. Game operation teams can use Infrastructure-as-code to standardize the definition and provisioning of Kubernetes clusters to make sure every deployment is consistent. Amazon EKS blueprints, an open-source project, provides game operations teams with a collection of pre-configured and validated Amazon EKS cluster patterns, allowing them to bootstrap production-ready Amazon EKS clusters. These patterns have been tested and validated by Kubernetes experts, ensuring reliability and adherence to best practices.
The blueprints can be customized to meet the game requirements. In the code sample below, the Amazon EKS Blueprints for Terraform is used to define an Amazon EKS cluster:
The Amazon EKS cluster blueprint uses the “vpc” module defined in the previous section. Amazon Elastic Compute Cloud (Amazon EC2) nodes for the Amazon EKS cluster are launched in the Amazon VPC created with by the Terraform script . To facilitate game operations, the module reads key variables from input parameters defined during infrastructure creation or updates.
The code sample defines parameters for the name of the cluster, the Kubernetes version to use, the type of Amazon EC2 instance running the game servers in containers, the minimum, and maximum number of Kubernetes nodes.
Game operation teams can control on which Kubernetes nodes game servers and dependent services should run. The code sample demonstrates this with an example where a node group called public_gameservers is used to isolate game servers. A separate node group called gameservers_metrics runs the components for an observability stack to monitor the game.
Amazon EKS allows for building a diverse cluster composed of different instance types, architectures (Graviton or Intel), and capacity models, such as spot instances. This flexibility enables optimizing the cluster’s cost-performance ratio by selecting the appropriate resource mix for each component. Kubernetes nodes are labeled and tainted to allow pods to selectively schedule themselves on appropriate nodes, optimizing resource utilization and workload distribution across the cluster.
Custom blueprints resulting from design iterations can be stored in a Git repository, offering the possibility to clone the infrastructure template and use a GitOps approach to launch standardized games clusters in specific regions.
Conclusion
The article presented best practices to guide game operation teams during the creation of Kubernetes clusters to host game servers. The article explains the importance of infrastructure-as-code to create reusable patterns for game launches.
We hope this blog has provided you with fundamental knowledge to improve the creation of Kubernetes clusters for games. The future articles will explore the AWS solution guidance to host game servers on Amazon EKS with Agones and Open Match, two popular open source frameworks for game servers hosting and matchmaking on Kubernetes. Stay tuned!
Author: Serge Poueme