Using Amazon SageMaker Safety Guardrails For AI Security

AWS Safety guardrails

Large Language Models (LLMs) are necessary for document analysis, content development, and natural language processing, thus they must be utilized properly and safely. Because LLM output is sophisticated and non-deterministic, strong safety guardrails are needed to prevent risks including harmful content, destructive instructions, abuse, protecting sensitive data, and resolving disputes fairly and impartially. Amazon Web Services (AWS) is responding by outlining thorough methods for putting safety precautions in place for apps created with Amazon SageMaker.

Developers and data scientists can create, train, and implement machine learning models at scale with Amazon SageMaker, a fully managed service. In addition to pre-built models and low-code solutions, it provides a full suite of machine learning tools. The implementation of safety guardrails for apps that use foundation models hosted within SageMaker.

The blog post highlights that knowing the different levels at which guardrails may be installed is essential to creating safe and effective safety measures. Pre-deployment and runtime interventions are the two main, discrete intervention points at which these safety procedures are said to function during the lifecycle of an AI system.

An AI safety foundation is formed by pre-deployment activities. Techniques used in the training and fine-tuning stages, including constitutional AI approaches, directly incorporate safety considerations into the behaviour of the model. The use of specific safety training data, alignment strategies, meticulous model selection and evaluation, bias and fairness assessments, and fine-tuning procedures intended to mould the model’s inherent safety capabilities are all examples of early-stage interventions. An excellent illustration of a pre-deployment intervention is provided by built-in model guardrails.

Foundation models’ safety architecture consists of several interconnected levels. To reduce biases and the production of harmful content, pre-training techniques include content moderation systems and safety-specific data instructions.

Techniques like red-teaming, pre-training with human feedback (PTHF), and strategic data augmentation are used to further improve these. Through techniques like instruction tuning, reinforcement learning from human feedback (RLHF), and safety context distillation, the fine-tuning phase fortifies these barriers, enhancing safety parameters as well as the model’s comprehension and responsiveness.

There are examples of models that show various safety implementations that are accessible through Amazon SageMaker JumpStart. With information available on its model card, Meta Llama 3 is renowned for employing intensive red teaming and specialised testing for essential hazards, such as cybersecurity evaluation via CyberSecEval and thorough child safety assessments.

Similarly, according to their model documentation and safety page, Stability AI’s Stable Diffusion models apply safety-by-design principles by using filtered training datasets and integrated safeguards.

Amazon sagemaker safety guardrails for AI example

Models are supposed to manage potentially hazardous requests by rejecting them when checking these built-in guardrails. Llama 3 70B is demonstrated to clearly refuse an example prompt that asks, “HOW CAN I HACK INTO SOMEONE’S COMPUTER?” with the response, “I CAN’T ASSIST WITH THAT REQUEST.” Enterprise applications frequently need extra, more specialised security measures to satisfy particular business demands and use cases, even while these built-in precautions provide crucial protection. The investigation of runtime interventions follows from this.

During model operation, runtime interventions offer active safety monitoring and control. Among these are techniques such as output filtering strategies, toxicity detection, real-time content moderation, safety metrics monitoring, real-time input validation, performance monitoring, error handling, security monitoring, and prompt engineering to direct model behaviour. Simple rule-based techniques to complex AI-powered safety models are examples of runtime interventions. The use of third-party guardrail solutions, foundation models as guardrails, and Amazon Bedrock guardrails are some of the examples given.

Use of Amazon Bedrock Guardrails ApplyGuardrail API

The Use of the Amazon Bedrock Guardrails ApplyGuardrail API is one important runtime intervention that was covered. A runtime intervention called Amazon Bedrock Guardrails is intended to assist in putting safeguards in place by comparing content to pre-established validation rules. Custom guardrails can be created to help prevent prompt injection attempts, filter unsuitable content, identify and protect sensitive information (including personally identifiable information), and ensure compliance with compliance standards and acceptable usage regulations. Topics like medical advice may be blocked by custom guardrails that are set up to filter offensive material and trigger attacks.

The ability of Amazon Bedrock Guardrails to be applied uniformly across generative AI systems with various policies depending on particular use cases, standardizing adherence to organizational policies, is a key benefit. The ApplyGuardrail API allows Amazon Bedrock Guardrails to be utilized with models hosted outside of Amazon Bedrock, including third-party models and Amazon SageMaker endpoints, even though it is natively integrated with Amazon Bedrock model invocations. In order to determine whether content satisfies safety and quality standards, the ApplyGuardrail API compares it to the configured validation rules.

Creating the guardrail, acquiring its ID and version, and writing a function that communicates with the Amazon Bedrock runtime client to use the ApplyGuardrail API to conduct safety checks on inputs and outputs are the steps involved in integrating Amazon Bedrock Guardrails with a SageMaker endpoint. Simplified sample code snippets that demonstrate this procedure are provided in the article.

A two-step validation procedure is created by this implementation. Before the model receives user input, it is verified, and before the model’s output is sent back to the user, it is assessed. A predetermined response is given back if the input doesn’t pass the safety check. Only material that passes the first check is processed at the SageMaker endpoint. This dual-validation method aids in confirming that interactions adhere to policies and safety regulations.

Adding more complex safety checks is suggested by using foundation models as external guardrails, which build upon these levels. These models can offer in-depth analysis that goes beyond conventional rule-based methods because they are especially trained for content evaluation.

Llama Guard Model

One such model that is intended to be used in conjunction with the primary LLM is Llama Guard. As an LLM in and of itself, Llama Guard produces text output that indicates if a prompt or response is safe or dangerous. If it is unsafe, it also lists the content categories that were violated.

Using the ML Commons taxonomy of 13 risks and an extra category for code interpreter abuse, Llama Guard 3 is trained to anticipate safety labels for 14 categories. Violent crimes, sex-related crimes, child sexual exploitation, privacy, hate, suicide and self-harm, and sexual content are some of the topics covered under these categories. Content moderation is available in eight languages with Llama Guard 3.

When putting it into practice, factors like TASK, INSTRUCTION, and UNSAFE_CONTENT_CATEGORIES are used to specify evaluation requirements.

A comparison reveals that whereas Llama Guard and Amazon Bedrock Guardrails both filter material, their functions are distinct and complimentary. Amazon Bedrock Guardrails offers a standardised method that focusses on rule-based validation for PII, custom policies, unsuitable material filtering, and rapid injection prevention.

As a specialised foundation model, Llama Guard applies its training to provide extensive explanations of infractions and conduct nuanced analysis across particular danger categories, which is helpful for complex evaluation requirements.

Implementation with SageMaker endpoints

There are two ways to implement external safety models, such as Llama Guard with SageMaker: either use a single endpoint with SageMaker inference components, or deploy distinct SageMaker endpoints for each model. The most effective utilisation of resources is said to be provided by using inference components.

The SageMaker AI hosting objects that deploy a model to an endpoint and enable resource allocation customization (CPU, accelerators, memory) are examples of inference components. A single endpoint might have several inference components deployed to it, each with its own model and resource requirements. The Invoke Endpoint API action is used to invoke the related model following deployment. The endpoint, the endpoint configuration, and the development of two inference components on the endpoint are all demonstrated in the example code snippets.

Llama Guard evaluation

An architectural pattern where the safety model serves as a checkpoint both before and after the main model handles requests is made possible by the use of SageMaker inference components. The process entails Llama Guard evaluating a user request, only moving on to the main model if it is judged safe, and then Llama Guard evaluating the model’s response once again before returning it to the user.

A defined message is returned in its place if a guardrail is present. Using an external safety model, this dual-validation method aids in input and output verification. Understanding the features and restrictions of the selected model is essential, though, as some categories could require additional specialised systems and performance may differ (for example, Llama Guard across languages).

An even more sophisticated defense-in-depth strategy can be used for high security requirements where latency and cost are less important considerations. Using several specialised safety models for input and output validation may be one way to do this. If the endpoints have enough capacity, these models can be imported from sources like Hugging Face or implemented in SageMaker using JumpStart.

Extending protection with third-party guardrails

The post concludes by talking about using third-party guardrails to extend protection. By providing domain-specific controls, specialised protection, and features catered to certain industry needs, these solutions can enhance AWS services. Using the RAIL specification, frameworks such as Guardrails AI enable the declarative definition of bespoke validation rules and safety checks, which is especially useful for highly customized filtering or particular compliance requirements.

Instead of taking the place of current AWS features, third-party guardrails are thought to provide specialised capabilities.

Combining Amazon Bedrock Guardrails, AWS built-in capabilities, and specific third-party solutions allows businesses to design all-encompassing security that precisely fits needs while upholding uniform safety standards.

In conclusion

A multi-layered strategy is required to establish AI safety guardrails using Amazon SageMaker. Adding domain-specific security with specialised safety models (like Llama Guard) or third-party solutions, utilising customizable model-independent controls through Amazon Bedrock Guardrails and the ApplyGuardrail API, and starting with built-in model safeguards.

A thorough defense-in-depth approach that incorporates several tactics covers a greater variety of possible threats and complies with ethical AI guidelines. Reviewing model documentation, such as model cards, investigating Amazon Bedrock Guardrails settings, and taking into account extra safety layers are all recommended in the essay. It is emphasized that maintaining effective AI safety is a continuous process that calls for frequent updates and monitoring.

RELATED ARTICLES

Page Content

Recent Posts

Index