Presenting Stable Diffusion 3.5
The most potent model in the Stable Diffusion family, Stable Diffusion 3.5 Large, was unveiled by Stability AI in October 2024. Several versions of this open release are available for use under the permissive Stability AI Community License, run on consumer hardware, and are customizable. Hugging Face’s Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo models, as well as the inference code, are currently available for download on GitHub.
The first open release from the Stable Diffusion 3 series, Stable Diffusion 3 Medium, was made available in June. Stability AI standards and the expectations of its communities were not entirely met by this release. Stability AI took the opportunity to further build a version that furthers its objective to improve visual media after hearing the insightful community input.
Stability AI’s dedication to providing builders and creators with tools that are broadly available, state-of-the-art, and free for the majority of use cases is reflected in Stable Diffusion 3.5. Whether it’s fine-tuning, LoRA, optimizations, applications, or artwork, it supports the dissemination and monetization of work across the pipeline.
A range of models created to satisfy the requirements of scientific researchers, enthusiasts, startups, and businesses are available in Stable Diffusion 3.5:
Stable Diffusion 3.5 Large
With 8.1 billion parameters, this basic model is the most potent member of the Stable Diffusion family because to its excellent quality and timely adherence. With its 1 megapixel resolution, this model is perfect for professional use cases.
Stable Diffusion 3.5 Large Turbo
Stable Diffusion 3.5 Large Turbo is a condensed version of Stable Diffusion 3.5 Large that produces remarkable rapid adherence and high-quality images in just 4 steps, which is significantly faster than Stable Diffusion 3.5 Large.
Stable Diffusion 3.5 Large Medium
This model, which balances quality and ease of customization, is made to run “out of the box” on consumer hardware at 2.5 billion parameters using enhanced MMDiT-X architecture and training techniques. Images with a resolution of 0.25 to 2 megapixels can be produced using it.
Creating the models
Stability AI gave customization first priority when creating the models in order to provide a versatile foundation on which to expand. In order to accomplish this, it incorporated Query-Key Normalization into the transformer blocks, which stabilized the model training procedure and made additional development and fine-tuning easier.
We have to make certain compromises in order to accommodate this degree of downstream flexibility. With multiple seeds, there may be more diversity in the outputs from the same prompt, which is deliberate because it keeps the base models’ diverse styles and larger knowledge base intact. Prompts that aren’t detailed enough, however, could result in more ambiguity in the final product, and the aesthetic quality could change.
To improve quality, coherence, and multi-resolution generation capabilities, we specially modified the architecture and training methods for the Medium model.
Where the models are superior
The Stable Diffusion 3.5 edition maintains superior performance in fast adherence and image quality while excelling in the following areas, making it one of the most accessible and customisable image models available:
Customisability: Create apps based on unique workflows or easily modify the model to suit your unique creative requirements.
Effective Performance: Designed to operate with minimal strain on common consumer hardware, particularly the Stable Diffusion 3.5 Medium and Stable Diffusion 3.5 Large Turbo variants.
In addition to various open-image base models, it examined the hardware compatibility for Stable Diffusion 3.5 Medium. This model is quite accessible and compatible with the majority of consumer GPUs, requiring only 9.9 GB of VRAM (not including text encoders) to reach its maximum performance.
Diverse Outputs: Without requiring a lot of prompting, it produces photos that are reflective of the world, not just one particular kind of human, with a range of skin tones and facial traits.
Versatile Styles: Able to produce almost any visual style that can be imagined, including 3D, photography, painting, line art, and more.
Furthermore, Stability AI investigation reveals that Stable Diffusion 3.5 Large competes considerably larger models in image quality and tops the industry in rapid adherence.
Even when compared to non-distilled models of comparable size, Stable Diffusion 3.5 Large Turbo provides some of the fastest inference times for its size while maintaining a very competitive image quality and prompt adherence.
Stable Diffusion 3.5 Medium is a great option for effective, high-quality performance since it balances fast adherence and image quality better than other medium-sized models.
An overview of the Stability AI Community license
Stability AI make this model available under its community licence, which is permissive. The following are the main elements of the licence:
- Free for non-commercial use: The model is freely available to people and organisations for scientific research and other non-commercial uses.
- Free for commercial use (up to $1M in annual income): As long as their combined yearly revenue is less than $1M, startups, small to medium-sized enterprises, and creators are permitted to use the model for commercial purposes without paying anything.
- Ownership of outputs: Maintain ownership of the produced media without any restrictions on licensing.
Additional methods to access the models
Although Hugging Face now offers the model weights for self-hosting, the following platforms additionally provide access to the model:
- Stability AI API
- Replicate
- Fireworks AI
- DeepInfra
- ComfyUI
Stable Diffusion 3.5 Large ControlNets
With the introduction of three ControlNets today Blur, Canny, and Depth Stability AI is expanding the functionality of Stable Diffusion 3.5 Large.
ControlNets gives you the resources you need to easily and precisely develop using Stable Diffusion 3.5 Large, whatever of your skill level. These adaptable models can handle a variety of inputs, which makes them perfect for a broad range of applications, including character development and interior design. The code and model weights are currently available for download on GitHub and Hugging Face, respectively. Comfy UI also supports the models.
What features included
Blur
Attain resolutions of 8K and 16K with incredibly high fidelity upscaling. Ideal for converting low-resolution photos into expansive, intricate pictures.
Uncanny
Make use of Canny edge maps to give your created photos structure. Although it can be used in any style, this control type is especially helpful for illustrations.
Depth
Utilise DepthFM-generated depth maps to direct the creation of images. Excellent for texturing 3D objects, creating architectural representations, and other applications where precise control over an image’s composition is necessary.
Stable Diffusion 3.5 Large is on Amazon Bedrock
Start using Amazon Bedrock’s Stable Diffusion 3.5 Large
If you have never used Stability AI models before, open the Amazon Bedrock console and select Model access in the lower left pane. Request access to Stable Diffusion 3.5 Large in Stability AI to view the most recent Stability AI models.
Select Image from the Playgrounds menu pane on the left to test the Stability AI models in Amazon Bedrock. Then click click model, choose Stable Diffusion 3.5 Large as the model, and choose Stability AI as the category.
You may also use code examples in the AWS Command Line Interface (AWS CLI) and AWS SDKs to access the model by selecting View API request. Stability.sd3-5-large-v1:0 can be used as the model ID.
Write the output JSON file to standard output and use the jq tool to extract the encoded image so that it may be decoded instantly to obtain the image with a single command. The img.png file contains the output.
If an output directory isn’t already there, it is created before the application writes the final image there. The code looks for existing files to determine the first file name that has the img_.png format accessible so as not to overwrite them.
Currently accessible
Today, the Stable Diffusion 3.5 Large model is widely accessible in the US West (Oregon) AWS Region on Amazon Bedrock. For upcoming upgrades, view the whole Region list.
Pricing
Pricing for on-demand
The following API requests are made to Amazon Bedrock by an application developer: a request to the SDXL model to produce a 512 × 512 image with a step size of 70 (premium grade).
One image * $0.036 each image = $0.036 is the total amount spent.
The following API requests are sent to Amazon Bedrock by an application developer: a request to the SDXL 1.0 model to produce a premium grade, 1024 x 1024 image with a step size of 70.
One image * $0.08 per image = $0.08 is the total amount spent.
Pricing for Provisioned Throughput
An SDXL 1.0 model unit is purchased by an application developer with a one-month commitment.
1 * $49.86 * 24 hours * 31 days = $37,095.84 is the total amount spent.
On-Demand pricing
Stability AI Model | Price per generated image |
Stable Diffusion 3.5 Large | $0.08 |
Stable Image Core | $0.04 |
Stable Diffusion 3 Large | $0.08 |
Stable Image Ultra | $0.14 |
Previous generation of image models offered by Stability AI are priced per image, depending on step count and image resolution.
Stability AI model | Image resolution | Price per image generated for standard quality (<=50 steps) | Price per image generated for premium quality (>50 steps) |
SDXL 1.0 | Up to 1024 x 1024 | $0.04 | $0.08 |
Provisioned Throughput pricing
Stability AI model | Price per hour per model unit for 1-month commitment | Price per hour per model unit for 6-month commitment |
SDXL 1.0 | $49.86 | $46.18 |