SambaSafety automates custom R workload, improving driver safety with Amazon SageMaker and AWS Step Functions

TutoSartup excerpt from this article:
At SambaSafety, their mission is to promote safer communities by reducing risk through data insights… Since 1998, SambaSafety has been the leading North American provider of cloud–based mobility risk management software for organizations with commercial and non–commercial drivers… SambaSafet…

At SambaSafety, their mission is to promote safer communities by reducing risk through data insights. Since 1998, SambaSafety has been the leading North American provider of cloud–based mobility risk management software for organizations with commercial and non–commercial drivers. SambaSafety serves more than 15,000 global employers and insurance carriers with driver risk and compliance monitoring, online training and deep risk analytics, as well as risk pricing solutions. Through the collection, correlation and analysis of driver record, telematics, corporate and other sensor data, SambaSafety not only helps employers better enforce safety policies and reduce claims, but also helps insurers make informed underwriting decisions and background screeners perform accurate, efficient pre–hire checks.

Not all drivers present the same risk profile. The more time spent behind the wheel, the higher your risk profile. SambaSafety’s team of data scientists has developed complex and propriety modeling solutions designed to accurately quantify this risk profile. However, they sought support to deploy this solution for batch and real-time inference in a consistent and reliable manner.

In this post, we discuss how SambaSafety used AWS machine learning (ML) and continuous integration and continuous delivery (CI/CD) tools to deploy their existing data science application for batch inference. SambaSafety worked with AWS Advanced Consulting Partner Firemind to deliver a solution that used AWS CodeStar, AWS Step Functions, and Amazon SageMaker for this workload. With AWS CI/CD and AI/ML products, SambaSafety’s data science team didn’t have to change their existing development workflow to take advantage of continuous model training and inference.

Customer use case

SambaSafety’s data science team had long been using the power of data to inform their business. They had several skilled engineers and scientists building insightful models that improved the quality of risk analysis on their platform. The challenges faced by this team were not related to data science. SambaSafety’s data science team needed help connecting their existing data science workflow to a continuous delivery solution.

SambaSafety’s data science team maintained several script-like artifacts as part of their development workflow. These scripts performed several tasks, including data preprocessing, feature engineering, model creation, model tuning, and model comparison and validation. These scripts were all run manually when new data arrived into their environment for training. Additionally, these scripts didn’t perform any model versioning or hosting for inference. SambaSafety’s data science team had developed manual workarounds to promote new models to production, but this process became time-consuming and labor-intensive.

To free up SambaSafety’s highly skilled data science team to innovate on new ML workloads, SambaSafety needed to automate the manual tasks associated with maintaining existing models. Furthermore, the solution needed to replicate the manual workflow used by SambaSafety’s data science team, and make decisions about proceeding based on the outcomes of these scripts. Finally, the solution had to integrate with their existing code base. The SambaSafety data science team used a code repository solution external to AWS; the final pipeline had to be intelligent enough to trigger based on updates to their code base, which was written primarily in R.

Solution overview

The following diagram illustrates the solution architecture, which was informed by one of the many open-source architectures maintained by SambaSafety’s delivery partner Firemind.

The solution delivered by Firemind for SambaSafety’s data science team was built around two ML pipelines. The first ML pipeline trains a model using SambaSafety’s custom data preprocessing, training, and testing scripts. The resulting model artifact is deployed for batch and real-time inference to model endpoints managed by SageMaker. The second ML pipeline facilitates the inference request to the hosted model. In this way, the pipeline for training is decoupled from the pipeline for inference.

One of the complexities in this project is replicating the manual steps taken by the SambaSafety data scientists. The team at Firemind used Step Functions and SageMaker Processing to complete this task. Step Functions allows you to run discrete tasks in AWS using AWS Lambda functions, Amazon Elastic Kubernetes Service (Amazon EKS) workers, or in this case SageMaker. SageMaker Processing allows you to define jobs that run on managed ML instances within the SageMaker ecosystem. Each run of a Step Function job maintains its own logs, run history, and details on the success or failure of the job.

The team used Step Functions and SageMaker, together with Lambda, to handle the automation of training, tuning, deployment, and inference workloads. The only remaining piece was the continuous integration of code changes to this deployment pipeline. Firemind implemented a CodeStar project that maintained a connection to SambaSafety’s existing code repository. When the industrious data science team at SambaSafety posts an update to a specific branch of their code base, CodeStar picks up the changes and triggers the automation.

Conclusion

SambaSafety’s new serverless MLOps pipeline had a significant impact on their capability to deliver. The integration of data science and software development enables their teams to work together seamlessly. Their automated model deployment solution reduced time to delivery by up to 70%.

SambaSafety also had the following to say:

“By automating our data science models and integrating them into their software development lifecycle, we have been able to achieve a new level of efficiency and accuracy in our services. This has enabled us to stay ahead of the competition and deliver innovative solutions to clients. Our clients will greatly benefit from this with the faster turnaround times and improved accuracy of our solutions.”

SambaSafety connected with AWS account teams with their problem. AWS account and solutions architecture teams worked to identify this solution by sourcing from our robust partner network. Connect with your AWS account team to identify similar transformative opportunities for your business.

About the Authors

Dan Ferguson is an AI/ML Specialist Solutions Architect (SA) on the Private Equity Solutions Architecture at Amazon Web Services. Dan helps Private Equity backed portfolio companies leverage AI/ML technologies to achieve their business objectives.

Khalil Adib is a Data Scientist at Firemind, driving the innovation Firemind can provide to their customers around the magical worlds of AI and ML. Khalil tinkers with the latest and greatest tech and models, ensuring that Firemind are always at the bleeding edge.

Jason Mathew is a Cloud Engineer at Firemind, leading the delivery of projects for customers end-to-end from writing pipelines with IaC, building out data engineering with Python, and pushing the boundaries of ML. Jason is also the key contributor to Firemind’s open source projects.

SambaSafety automates custom R workload, improving driver safety with Amazon SageMaker and AWS Step Functions
Author: Dan Ferguson