Amazon Bedrock Model Distillation
Summary
Smaller, quicker, and less expensive models that provide precision tailored to specific use cases that is on par with Amazon Bedrock’s most sophisticated models can be used with Amazon Bedrock Model Distillation. With less than 2% accuracy loss for use cases like RAG, distilled models in Amazon Bedrock can be up to 500% faster and up to 75% less expensive than original models.
Make use of more affordable, smaller models
Customers can choose a “teacher” model with Model Distillation whose accuracy they like to attain for their use-case, and then choose a “student” model that they wish to refine. Additionally, customers supply prompts for their use-case. The process of creating instructor replies and utilising them to improve the student model is automated by Model Distillation. At a lower cost, student models can then exhibit equivalent accuracy to teacher models.
Use proprietary data synthesis to optimise the performance of the distilled model
Iteratively fine-tuning a smaller, more economical model to match a larger model’s accuracy for your particular use case is possible. Model Distillation may decide to use various data synthesis techniques that are most appropriate for your use-case in order to lessen the strain of iteration required to attain better outcomes. For instance, Bedrock might create comparable prompts to increase the training dataset or use customer-provided prompt-response pairings as gold standards to produce excellent synthetic responses.
Bring your production data with you to cut costs
Customers must generate cues and responses while using traditional fine-tuning. Customers just need to supply prompts for Model Distillation to employ in order to create synthetic responses and refine the student models. Clients can provide us their invocation logs and use specific metadata fields to filter the logs out. By avoiding the need to generate responses from the teacher model again, Model Distillation can save money by reading prompts and responses through invocation logs and avoiding the creation of synthetic responses. Use code samples to get started.
Amazon Bedrock Model Distillation (preview) builds accurate, rapid, and affordable models
The availability of Amazon Bedrock Model Distillation in preview, by using the generated responses from a large foundation model (FM) called a teacher model to refine a smaller FM called a student model, automates the process of creating a distilled model for your particular use case. It enhances a teacher model’s responsiveness by employing data synthesis techniques. The final distilled model for inference is then hosted by Amazon Bedrock, providing you with a model that is faster, more economical, and has an accuracy that is comparable to the teacher model for your use case.
Consumers are eager to adopt Amazon Bedrock’s most potent and precise FMs for their generative AI applications. However, these models’ latency isn’t optimal for some use situations. Additionally, as businesses grow their generative AI apps to billions of user interactions, consumers are seeking improved price performance. Customers are moving to smaller models in order to lower latency and make them more cost-effective for their use case. Smaller models, however, are unable to offer the best accuracy for some application scenarios. In order to provide high-quality labelled datasets and improve model accuracy for client use cases, fine-tuning models calls for a different set of abilities.
Through the knowledge transfer process, Amazon Bedrock Model Distillation can improve the accuracy of a smaller student model to resemble a teacher model with higher performance. By transferring knowledge from a teacher model of your choice to a student model in the same family, you can create distilled models that, for a given use case, are up to five times faster and up to 75 percent less expensive than original large models, with less than two percent accuracy loss for use cases like Retrieval Augmented Generation (RAG).
How does it work?
Amazon Bedrock Model Distillation refines a student model, creates replies from teacher models, and enhances response creation from a teacher model by incorporating proprietary data synthesis.

How Amazon Bedrock Model Distillation works
To improve response generation from the instructor model and produce fine-tuning datasets of superior quality, Amazon Bedrock uses a variety of data synthesis approaches. These methods are designed for particular use cases. For example, by producing comparable prompts, Amazon Bedrock might supplement the training dataset, so expanding the size of the fine-tuning dataset.
As an alternative, it can generate excellent instructor responses by using the given prompt-response pairings as prime illustrations. Anthropic, Meta, and Amazon models are supported by Amazon Bedrock Model Distillation at preview.
Things to be aware of
Here are some key points to be aware of.
- The goal of model distillation is to improve the student model’s correctness so that it matches the teacher model’s performance for your particular use case. We advise you to assess various instructor models for your use case and choose the one that best suits it before starting the model distillation process.
- AWS advises tailoring your suggestions to the use case in which you find the accuracy of the teacher model acceptable. Send in these prompts as the input data for the distillation.
- Examine the latency profiles of several student model possibilities for your use case in order to select a corresponding student model to fine-tune. The latency profile of the final distilled model will match that of the student model you choose.
- Instead of developing a distilled model, AWS advises employing a particular student model exactly as is if it works well for your use case.
FAQs
What is Model distillation?
A machine learning method called “model distillation” moves information from a large, complicated model to a more manageable, effective one. The objective is to reduce processing and memory needs while producing a smaller model that functions similarly to the larger model.