Mixtral 8x22B is now available in Amazon SageMaker JumpStart
Today, we are excited to announce the Mixtral-8x22B large language model (LLM), developed by Mistral AI, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference… You can try out this model with SageMaker JumpStart, a machine learning (ML) hub th…
Today, we are excited to announce the Mixtral-8x22B large language model (LLM), developed by Mistral AI, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Mixtral-8x22B model.
What is Mixtral 8x22B
Mixtral 8x22B is Mistral AI’s latest open-weights model and sets a new standard for performance and efficiency of available foundation models, as measured by Mistral AI across standard industry benchmarks. It is a sparse Mixture-of-Experts (SMoE) model that uses only 39 billion active parameters out of 141 billion, offering cost-efficiency for its size. Continuing with Mistral AI’s belief in the power of publicly available models and broad distribution to promote innovation and collaboration, Mixtral 8x22B is released under Apache 2.0, making the model available for exploring, testing, and deploying. Mixtral 8x22B is an attractive option for customers selecting between publicly available models and prioritizing quality, and for those wanting a higher quality from mid-sized models, such as Mixtral 8x7B and GPT 3.5 Turbo, while maintaining high throughput.
Mixtral 8x22B provides the following strengths:
- Multilingual native capabilities in English, French, Italian, German, and Spanish languages
- Strong mathematics and coding capabilities
- Capable of function calling that enables application development and tech stack modernization at scale
- 64,000-token context window that allows precise information recall from large documents
About Mistral AI
Mistral AI is a Paris-based company founded by seasoned researchers from Meta and Google DeepMind. During his time at DeepMind, Arthur Mensch (Mistral CEO) was a lead contributor on key LLM projects such as Flamingo and Chinchilla, while Guillaume Lample (Mistral Chief Scientist) and Timothée Lacroix (Mistral CTO) led the development of LLaMa LLMs during their time at Meta. The trio are part of a new breed of founders who combine deep technical expertise and operating experience working on state-of-the-art ML technology at the largest research labs. Mistral AI has championed small foundational models with superior performance and commitment to model development. They continue to push the frontier of artificial intelligence (AI) and make it accessible to everyone with models that offer unmatched cost-efficiency for their respective sizes, delivering an attractive performance-to-cost ratio. Mixtral 8x22B is a natural continuation of Mistral AI’s family of publicly available models that include Mistral 7B and Mixtral 8x7B, also available on SageMaker JumpStart. More recently, Mistral launched commercial enterprise-grade models, with Mistral Large delivering top-tier performance and outperforming other popular models with native proficiency across multiple languages.
What is SageMaker JumpStart
With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances within a network isolated environment, and customize models using SageMaker for model training and deployment. You can now discover and deploy Mixtral-8x22B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, providing data encryption at rest and in-transit.
SageMaker also adheres to standard security frameworks such as ISO27001 and SOC1/2/3 in addition to complying with various regulatory requirements. Compliance frameworks like General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA), Health Insurance Portability and Accountability Act (HIPAA), and Payment Card Industry Data Security Standard (PCI DSS) are supported to make sure data handling, storing, and process meet stringent security standards.
SageMaker JumpStart availability is dependent on the model; Mixtral-8x22B v0.1 is currently supported in the US East (N. Virginia) and US West (Oregon) AWS Regions.
Discover models
You can access Mixtral-8x22B foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.
In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane.
From the SageMaker JumpStart landing page, you can search for “Mixtral” in the search box. You will see search results showing Mixtral 8x22B Instruct, various Mixtral 8x7B models, and Dolphin 2.5 and 2.7 models.
You can choose the model card to view details about the model such as license, data used to train, and how to use. You will also find the Deploy button, which you can use to deploy the model and create an endpoint.
SageMaker has seamless logging, monitoring, and auditing enabled for deployed models with native integrations with services like AWS CloudTrail for logging and monitoring to provide insights into API calls and Amazon CloudWatch to collect metrics, logs, and event data to provide information into the model’s resource utilization.
Deploy a model
Deployment starts when you choose Deploy. After deployment finishes, an endpoint has been created. You can test the endpoint by passing a sample inference request payload or selecting your testing option using the SDK. When you select the option to use the SDK, you will see example code that you can use in your preferred notebook editor in SageMaker Studio. This will require an AWS Identity and Access Management (IAM) role and policy attached to it to restrict model access. Additionally, if you choose to deploy the model endpoint within SageMaker Studio, you will be prompted to choose an instance type, initial instance count, and maximum instance count. The ml.p4d.24xlarge and ml.p4de.24xlarge instance types are the only instance types currently supported for Mixtral 8x22B Instruct v0.1.
To deploy using the SDK, we start by selecting the Mixtral-8x22b model, specified by the model_id
with value huggingface-llm-mistralai-mixtral-8x22B-instruct-v0-1
. You can deploy any of the selected models on SageMaker with the following code. Similarly, you can deploy Mixtral-8x22B instruct using its own model ID.
This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel.
After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:
Example prompts
You can interact with a Mixtral-8x22B model like any standard text generation model, where the model processes an input sequence and outputs predicted next words in the sequence. In this section, we provide example prompts.
Mixtral-8x22b Instruct
The instruction-tuned version of Mixtral-8x22B accepts formatted instructions where conversation roles must start with a user prompt and alternate between user instruction and assistant (model answer). The instruction format must be strictly respected, otherwise the model will generate sub-optimal outputs. The template used to build a prompt for the Instruct model is defined as follows:
<s>
and </s>
are special tokens for beginning of string (BOS) and end of string (EOS), whereas [INST]
and [/INST]
are regular strings.
The following code shows how you can format the prompt in instruction format:
Summarization prompt
You can use the following code to get a response for a summarization:
The following is an example of the expected output:
Multilingual translation prompt
You can use the following code to get a response for a multilingual translation:
The following is an example of the expected output:
Code generation
You can use the following code to get a response for code generation:
You get the following output:
Reasoning and math
You can use the following code to get a response for reasoning and math:
You get the following output:
Clean up
After you’re done running the notebook, delete all resources that you created in the process so your billing is stopped. Use the following code:
Conclusion
In this post, we showed you how to get started with Mixtral-8x22B in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.
Now that you are aware of Mistral AI and their Mixtral 8x22B models, we encourage you to deploy an endpoint on SageMaker to perform inference testing and try out responses for yourself. Refer to the following resources for more information:
About the Authors
Marco Punio is a Solutions Architect focused on generative AI strategy, applied AI solutions, and conducting research to help customers hyper-scale on AWS. He is a qualified technologist with a passion for machine learning, artificial intelligence, and mergers and acquisitions. Marco is based in Seattle, WA, and enjoys writing, reading, exercising, and building applications in his free time.
Preston Tuggle is a Sr. Specialist Solutions Architect working on generative AI.
June Won is a product manager with Amazon SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications. His experience at Amazon also includes mobile shopping application and last mile delivery.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.
Shane Rai is a Principal GenAI Specialist with the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using AWS’s breadth of cloud-based AI/ML services including model offerings from top tier foundation model providers.
Hemant Singh is an Applied Scientist with experience in Amazon SageMaker JumpStart. He got his master’s from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.
Author: Marco Punio