Multimodal Toxicity Detection with picture support is now possible with AWS Bedrock Guardrails (preview).
AWS Bedrock Guardrails
AWS Bedrock Guardrails’ preview of multimodal toxicity detection with picture support. This new feature helps you control model outputs in your generative AI applications and enhance user experiences by detecting and filtering out unwanted visual information in addition to text.
By screening unwanted material, removing personally identifying information (PII), and improving content safety and privacy, AWS Bedrock Guardrails assists you in putting protections in place for generative AI applications. To customize protections to your unique use cases and responsible AI policies, you may set up policies for forbidden subjects, content filters, word filters, PII redaction, contextual grounding checks, and Automated Reasoning checks (preview).
With this release, you may now identify and restrict dangerous picture material in categories including hatred, insults, sexual content, and violent content using AWS Bedrock Guardrails content filter policy. Thresholds can be set from low to high to suit the requirements of your application.
All of Amazon Bedrock’s foundation models (FMs) that accept image data are compatible with this additional image capability, as are any specially adjusted models you provide. It facilitates the development of responsible AI applications by offering a uniform layer of security across text and picture modalities.
AWS Bedrock Guardrails has the potential to be a crucial aspect of safeguarding future AI applications, especially for relevance and contextual grounding checks and multimodal protections. Amazon Bedrock Guardrails will be essential in facilitating more precise diagnosis and analysis of multimodal material as the business plans to include product design diagrams and manuals into its apps.
How it operate
Action of multimodal toxicity detection
Create a guardrail in the AWS Management Console and set up the content filters for either image or text data, or both, to get started. AWS SDKs are another way to include this feature into your apps.
Make a guardrail
Go to Amazon Bedrock on the console and choose Guardrails. You can then utilize the content filters that are already in place to identify and block picture data in addition to text data by creating a new guardrail. Text, picture, or both types of material can be set up in the Hate, Insults, Sexual, and Violence categories under Configure content filters. Only text content may be set up for the Misconduct and Prompt attack categories.
You may save the guardrail and begin using it to create responsible and secure generative AI apps once you’ve chosen and set up the content filters you wish to utilize.
Click the guardrail and click Test to test the newly installed guardrail in the console. There are two ways to test the guardrail: either choose and invoke a model, or use the standalone Apply Guardrails API from AWS Bedrock Guardrails to test the guardrail without calling a model.
You may verify material at any stage of your application flow using the Apply Guardrails API before processing or presenting the user with the results. Regardless of the underlying infrastructure, you may use the API to assess inputs and outputs for any third-party or self-managed (custom) FMs. A Meta Llama 3.2 model stored on Amazon SageMaker or a Mistral NeMo model operating on your laptop, for instance, might be evaluated using the API.
Selecting and executing a model to test the guardrail
Choose a model that can accept picture inputs or outputs, such the Claude 3.5 Sonnet from Anthropic. Make sure the response and prompt filters are turned on for the picture content. Then choose Run, enter a prompt, and upload an image file.
AWS Bedrock Guardrails stepped in in my scenario. To get additional information, select View trace.
A record of the use of safety precautions during an encounter is provided by the guardrail trail. It displays the results of evaluations of both input (prompt) and output (model response), as well as whether or not AWS Bedrock Guardrails stepped in. Because the content filters in my case had a high degree of confidence in identifying insults in the image, they barred the input prompt.
Guardrail test without using a model
To test the guardrail without calling a model, choose Use Guardrails independent API in the console. Select whether you wish to verify an example of a model-generated output or an input prompt.
Next, follow the previous steps again. Make that the image content prompt and answer filters are turned on, then enter the material to be validated and choose Run.