How Dialog Axiata used Amazon SageMaker to scale ML models in production with AI Factory and reduced customer churn within 3 months

TutoSartup excerpt from this article:
To address this challenge, Dialog Axiata has pioneered a cutting-edge solution called the Home Broadband (HBB) Churn Prediction Model… This post explores the intricacies of Dialog Axiata’s approach, from the meticulous creation of nearly 100 features across 10 distinct areas and the implemen…

The telecommunications industry is more competitive than ever before. With customers able to easily switch between providers, reducing customer churn is a crucial priority for telecom companies who want to stay ahead. To address this challenge, Dialog Axiata has pioneered a cutting-edge solution called the Home Broadband (HBB) Churn Prediction Model.

This post explores the intricacies of Dialog Axiata’s approach, from the meticulous creation of nearly 100 features across 10 distinct areas and the implementation of two essential models using Amazon SageMaker:

A base model powered by CatBoost, an open source implementation of the Gradient Boosting Decision Tree (GBDT) algorithm
An ensemble model, taking advantage of the strengths of multiple machine learning (ML) models

About Dialog Axiata

Dialog Axiata PLC (part of the Axiata Group Berhad) is one of Sri Lanka’s largest quad-play telecommunications service providers and the country’s largest mobile network operator with 17.1 million subscribers, which amounts to 57% of the Sri Lankan mobile market. Dialog Axiata provides a variety of services, such as fixed-line, home broadband, mobile, television, payment apps, and financial services in Sri Lanka.

In 2022, Dialog Axiata made significant progress in their digital transformation efforts, with AWS playing a key role in this journey. They focused on improving customer service using data with artificial intelligence (AI) and ML and saw positive results, with their Group AI Maturity increasing from 50% to 80%, according to the TM Forum’s AI Maturity Index.

Dialog Axiata runs some of their business-critical telecom workloads on AWS, including Charging Gateway, Payment Gateway, Campaign Management System, SuperApp, and various analytics tasks. They use variety of AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Kubernetes Service (Amazon EKS) for computing, Amazon Relational Database Service (Amazon RDS) for databases, Amazon Simple Storage Service (Amazon S3) for object storage, Amazon OpenSearch Service for search and analytics, SageMaker for ML, and AWS Glue for data integration. This strategic use of AWS services delivers efficiency and scalability of their operations, as well as the implementation of advanced AI/ML applications.

For more about how Axiata uses AWS services, see Axiata Selects AWS as its Primary Cloud Provider to Drive Innovation in the Telecom Industry

Challenges with understanding customer churn

The Sri Lankan telecom market has high churn rates due to several factors. Multiple mobile operators provide similar services, making it easy for customers to switch between providers. Prepaid services dominate the market, and multi-SIM usage is widespread. These conditions lead to a lack of customer loyalty and high churn rates.

In addition to its core business of mobile telephony, Dialog Axiata also offers a number of services, including broadband connections and Dialog TV. However, customer churn is a common issue in the telecom industry. Therefore, Dialog Axiata needs to find ways to reduce their churn rate and retain more of their existing home broadband customers. Potential solutions could involve improving customer satisfaction, enhancing value propositions, analyzing reasons for churn, or implementing customer retention initiatives. The key is for Dialog Axiata to gain insights into why customers are leaving and take meaningful actions to increase customer loyalty and satisfaction.

Solution overview

To reduce customer churn, Dialog Axiata used SageMaker to build a predictive model that assigns each customer a churn risk score. The model was trained on demographic, network usage, and network outage data from across the organization. By predicting churn 45 days in advance, Dialog Axiata is able to proactively retain customers and significantly reduce customer churn.

Dialog Axiata’s churn prediction approach is built on a robust architecture involving two distinct pipelines: one dedicated to training the models, and the other for inference or making predictions. The training pipeline is responsible for developing the base model, which is a CatBoost model trained on a comprehensive set of features. To further enhance the predictive capabilities, an ensemble model is also trained to identify potential churn instances that may have been missed by the base model. This ensemble model is designed to capture additional insights and patterns that the base model alone may not have effectively captured.

The integration of the ensemble model alongside the base model creates a synergistic effect, resulting in a more comprehensive and accurate inference process. By combining the strengths of both models, Dialog Axiata’s churn prediction system gains an enhanced overall predictive capability, providing a more robust and reliable identification of customers at risk of churning.

Both the training and inference pipelines are run three times per month, aligning with Dialog Axiata’s billing cycle. This regular schedule makes sure that the models are trained and updated with the latest customer data, enabling timely and accurate churn predictions.

In the training process, features are sourced from Amazon SageMaker Feature Store, which houses nearly 100 carefully curated features. Because real-time inference is not a requirement for this specific use case, an offline feature store is used to store and retrieve the necessary features efficiently. This approach allows for batch inference, significantly reducing daily expenses to under $0.50 while processing batch sizes averaging around 100,000 customers within a reasonable runtime of approximately 50 minutes.

Dialog Axiata has meticulously selected instance types to strike a balance between optimal resource utilization and cost-effectiveness. However, should the need arise for faster pipeline runtime, larger instance types can be recommended. This flexibility allows Dialog Axiata to adjust the pipeline’s performance based on specific requirements, while considering the trade-off between speed and cost considerations.

After the predictions are generated separately using both the base model and the ensemble model, Dialog Axiata takes action to retain the customers identified as potential churn risks. The customers predicted to churn by the base model, along with those exclusively identified by the ensemble model, are targeted with personalized retention campaigns. By excluding any overlapping customers between the two models, Dialog Axiata ensures a focused and efficient outreach strategy.

The following figure illustrates the output predictions and churn probabilities generated by the base model and the ensemble model.

The first table is the output from the base model, which provides valuable insights into each customer’s churn risk. The columns in this table include a customer identifier (Cx), a Churn Reason column that highlights potential reasons for churn, such as Daily Usage or ARPU Drop (Average Revenue Per User), and a Churn Probability column that quantifies the likelihood of each customer churning.

The second table presents the output from the ensemble model, a complementary approach designed to capture additional churn risks that may have been missed by the base model. This table has two columns: the customer identifier (Cx) and a binary Churn column that indicates whether the customer is predicted to churn (1) or not (0).

The arrows connecting the two tables visually represent the process Dialog Axiata employs to comprehensively identify customers at risk of churning.

The following figure showcases the comprehensive output of this analysis, where customers are meticulously segmented, scored, and classified according to their propensity to churn or discontinue their services. The analysis delves into various factors, such as customer profiles, usage patterns, and behavioral data, to accurately identify those at a higher risk of churning. With this predictive model, Dialog Axiata can pinpoint specific customer segments that require immediate attention and tailored retention efforts.

With this powerful information, Dialog Axiata develops targeted retention strategies and campaigns specifically designed for high-risk customer groups. These campaigns may include personalized offers, as shown in the following figure, incentives, or customized communication aimed at addressing the unique needs and concerns of at-risk customers.

These personalized campaigns, tailored to each customer’s needs and preferences, aim to proactively address their concerns and provide compelling reasons for them to continue their relationship with Dialog Axiata.

Methodologies

This solution uses the following methodologies:

Comprehensive analysis of customer data – The foundation of the solution’s success lies in the comprehensive analysis of more than 100 features spanning demographic, usage, payment, network, package, geographic (location), quad-play, customer experience (CX) status, complaint, and other related data. This meticulous approach allows Dialog Axiata to gain valuable insights into customer behavior, enabling them to predict potential churn events with remarkable accuracy.
Dual-model strategy (base and ensemble models) – What sets Dialog Axiata’s approach apart is the use of two essential models. The base model, powered by CatBoost, provides a solid foundation for churn prediction. The threshold probability to define churn is calculated by considering ROC optimization and business requirements. Concurrently, the ensemble model strategically combines the strengths of various algorithms. This combination enhances the robustness and accuracy of the predictions. The models are developed considering precision as the evaluation parameter.
Actionable insights shared with business units – The insights derived from the models are not confined to the technical realm. Dialog Axiata ensures that these insights are effectively communicated and put into action by sharing the models separately with the business units. This collaborative approach means that the organization is better equipped to proactively address customer churn.
Proactive measures with two action types – Equipped with insights from the models, Dialog Axiata has implemented two main action types: network issue-based and non-network issue-based. During the inference phase, the churn status and churn reason are predicted. The top five features that have a high probability for the churn reason are selected using SHAP (SHapley Additive exPlanations). Then, the selected features associated with the churn reason are further classified into two categories: network issue-based and non-network issue-based. If there are features related to network issues, those users are categorized as network issue-based users. The resultant categorization, along with the predicted churn status for each user, is then transmitted for campaign purposes. This information is valuable in scheduling targeted campaigns based on the identified churn reasons, enhancing the precision and effectiveness of the overall campaign strategy.

Dialog Axiata’s AI Factory

Dialog Axiata built the AI Factory to facilitate running all AI/ML workloads on a single platform with multiple capabilities across various building blocks. To tackle technical aspects and challenges related to continuous integration and continuous delivery (CI/CD) and cost-efficiency, Dialog Axiata turned to the AI Factory framework. Using the power of SageMaker as the platform, they implemented separate SageMaker pipelines for model training and inference, as shown in the following diagram.

A primary advantage lies in cost reduction through the implementation of CI/CD pipelines. By conducting experiments within these automated pipelines, significant cost savings could be achieved. It also helps maintain an experiment version tracking system. Additionally, the integration of AI Factory components contributes to a reduction in time to production and overall workload by reducing repetitive tasks through the use of reusable artifacts. The incorporation of an experiment tracking system facilitates the monitoring of performance metrics, enabling a data-driven approach to decision-making.

Furthermore, the deployment of alerting systems enhances the proactive identification of failures, allowing for immediate actions to resolve issues. Data drift and model drift are also monitored. This streamlined process makes sure that any issues are addressed promptly, minimizing downtime and optimizing system reliability. By developing this project under the AI Factory framework, Dialog Axiata could overcome the aforementioned challenges.

Furthermore, the AI Factory framework provides a robust security framework to govern confidential user data and access permissions. It offers solutions to optimize AWS costs, including lifecycle configurations, alerting systems, and monitoring dashboards. These measures contribute to enhanced data security and cost-effectiveness, aligning with Dialog Axiata’s objectives and resulting in the efficient operation of AI initiatives.

Dialog Axiata’s MLOps process

The following diagram illustrates Dialog Axiata’s MLOps process.

The following key components are used in the process:

SageMaker as the ML Platform – Dialog Axiata uses SageMaker as their core ML platform to perform feature engineering, and train and deploy models in production.
SageMaker Feature Store – By using a centralized repository for ML features, SageMaker Feature Store enhances data consumption and facilitates experimentation with validation data. Instead of directly ingesting data from the data warehouse, the required features for training and inference steps are taken from the feature store. With SageMaker Feature Store, Dialog Axiata could reduce the time for feature creation because they could reuse the same features.
Amazon SageMaker Pipelines – Amazon SageMaker Pipelines is a CI/CD service for ML. These workflow automation components helped the Dialog Axiata team effortlessly scale their ability to build, train, test, and deploy multiple models in production; iterate faster; reduce errors due to manual orchestration; and build repeatable mechanisms.
Reusable components – Employing containerized environments, such as Docker images, and custom modules promoted the bring your own code approach within Dialog Axiata’s ML pipelines.
Monitoring and alerting – Monitoring tools and alert systems provided ongoing success by keeping track of the model and pipeline status.

Business outcomes

The churn prediction solution implemented by Dialog Axiata has yielded remarkable business outcomes, exemplifying the power of data-driven decision-making and strategic deployment of AI/ML technologies. Within a relatively short span of 5 months, the company witnessed a substantial reduction in month-over-month gross churn rates, a testament to the effectiveness of the predictive model and the actionable insights it provides.

This outstanding achievement not only underscores the robustness of the solution, it also highlights its pivotal role in fortifying Dialog Axiata’s position as a leading player in Sri Lanka’s highly competitive telecommunications landscape. By proactively identifying and addressing potential customer churn risks, the company has reinforced its commitment to delivering exceptional service and fostering long-lasting customer relationships.

Conclusion

Dialog Axiata’s journey in overcoming telecom churn challenges showcases the power of innovative solutions and the seamless integration of AI technologies. By using the AI Factory framework and SageMaker, Dialog Axiata not only addressed complex technical challenges, but also achieved tangible business benefits. This success story emphasizes the crucial role of predictive analytics in staying ahead in the competitive telecom industry, demonstrating the transformative impact of advanced AI models.

We appreciate you for reading this post, and hope you learned something new and useful. Please don’t hesitate to leave your feedback in the comments section.

Thank you Nilanka S. Weeraman, Sajani Jayathilaka, and Devinda Liyanage for your valuable contributions to this blog post.

About the Authors

Senthilvel (Vel) Palraj is a Senior Solutions Architect at AWS with over 15 years of IT experience. In this role, he helps customers in the telco, and media and entertainment industries across India and SAARC countries transition to the cloud. Before joining AWS India, Vel worked as a Senior DevOps Architect with AWS ProServe North America, supporting major Fortune 500 corporations in the United States. He is passionate about GenAI & AIML and leverages his deep knowledge to provide strategic guidance to companies looking to adopt and optimize AWS services. Outside of work, Vel enjoys spending time with his family and mountain biking on rough terrains.

Chamika Ramanayake is the Head of AI Platforms at Dialog Axiata PLC, Sri Lanka’s leading telecommunications company. He leverages his 7 years of experience in the telecommunication industry when leading his team to design and set the foundation to operationalize the end-to-end AI/ML system life cycle in the AWS cloud environment. He holds an MBA from PIM, University of Sri Jayawardenepura, and a B.Sc. Eng (Hons) in Electronics and Telecommunication Engineering from the University of Moratuwa.

How Dialog Axiata used Amazon SageMaker to scale ML models in production with AI Factory and reduced customer churn within 3 months
Author: Senthilvel (Vel) Palraj