Stable Diffusion is a cutting-edge technique that makes use of latent diffusion models to produce high-quality images from textual descriptions. Easy-to-use pipelines for deploying and utilizing the Stable Diffusion model, including creating, editing, and upscaling photos, are provided by the Hugging Face diffusers package.
Best way to upscale Stable Diffusion
We will tell how to use the Stable Diffusion Upscale Pipeline from the diffusers library to upscale images produced by stable diffusion in this article. Go over the rationale behind upscaling and show you how to use the Intel Extension for the PyTorch package (a Python package where Intel releases its latest optimizations and features before upstreaming them into open-source PyTorch) to optimize this process for better performance on Intel Xeon Processors.
How can the Stable Diffusion Upscale Pipeline be made more efficient for inference?
Using the Stable Diffusion model, the Stable Diffusion Upscale Pipeline from the Hugging Face diffusers library is intended to improve input image resolution, specifically increasing the resolution by a factor of four. This pipeline employs a number of different components, such as a frozen CLIP text model for text encoding, a Variational Auto-Encoder (VAE) for picture encoding and decoding, a UNet architecture for image latent noise reduction, and multiple schedulers to control the diffusion process during image production.
This pipeline is perfect for bringing out the details in artificial or real-world photos, as it is especially helpful for applications that need to produce high-quality image outputs from lower resolution inputs. In order to balance fidelity to the input text versus image quality, users can define a variety of parameters, including the number of denoising steps. Custom callbacks can also be supported during the inference process to allow for monitoring or modification of the generation.
Tune each of the Stable Diffusion Upscale Pipeline’s component parts separately before combining them to improve the pipeline’s performance. An essential component of this improvement is the PyTorch Intel Extension. With sophisticated optimizations, the addon improves PyTorch and offers a further boost in speed on Intel technology. These improvements make use of the capabilities of Intel CPUs’ Vector Neural Network Instructions (VNNI), Intel Advanced Vector Extensions 512 (Intel AVX-512), and Intel Advanced Matrix Extensions (Intel AMX). The Python API ipex.optimize()
, which is accessible through the Intel Extension for PyTorch, automatically optimizes the pipeline module, enabling it to take advantage of these advanced hardware instructions for increased performance efficiency.
Sample Code
Using the Intel Extension for PyTorch for performance enhancements, the code sample below shows how to upscale a picture using the Stable Diffusion Upscale Pipeline from the diffusers library. The pipeline’s U-Net, VAE, and text encoder components are all individually targeted and CPU inference-optimized.
Configuring the surroundings
It is advised to carry out the installations in a virtualized Conda environment. Install the Intel Extension for PyTorch, diffusers, and PyTorch itself:
python -m pip install torch torchvision torchaudio –index-url https://download.pytorch.org/whl/cpu
python -m pip install intel-extension-for-pytorch
python -m pip install oneccl_bind_pt –extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
pip install transformers
pip install diffusers
How to Improve
First, let’s load the sample image that you want to upscale and import all required packages, including the Intel Extension for PyTorch.
Let’s now examine how the PyTorch Intel Extension capabilities can be used to optimize the upscaling pipeline.
To optimize, each pipeline component is targeted independently. Initially, you configure the text encoder, VAE, and UNet to use the Channels Last format. Tensor dimensions are ordered as batch, height, width, and channels using the channel’s last format. Because it better fits specific memory access patterns, this structure is more effective and improves performance. For convolutional neural networks, channels last is especially helpful because it minimizes the requirement for data reordering during operations, which can greatly increase processing speed.
Similar to this, ipex.optimize()} is used by Intel Extension for PyTorch to optimize each component, with the data type set to BFloat16. With the help of Intel® AMX, which is compatible with 4th generation Xeon Scalable Processors and up, BFloat16 precision operations are optimized. You can enable Intel AMX, an integrated artificial intelligence accelerator, for lower precision data types like BFloat16 and INT8 by utilizing IPEX's
optimize()} function.
Ultimately, you can achieve the best results for upscaling by employing mixed precision, which combines the numerical stability of higher precision (e.g., FP32) with the computational speed and memory savings of lower precision arithmetic (e.g., BF16). This pipeline automatically applies mixed precision when `torch.cpu.amp.autocast()} is set. Now that the pipeline object has been optimized using Intel Extension for PyTorch, it can be utilized to achieve minimal latency and upscale images.
Configuring an Advanced Environment
This section explains how to configure environment variables and configurations that are optimized for Intel Xeon processor performance, particularly for memory management and parallel processing, to gain even more performance boosts. Some environment variables unique to the Intel OpenMP library are configured by the script ‘env_activate.sh’. Additionally, it specifies which shared libraries are loaded before others by using LD_PRELOAD. In order to ensure that particular libraries are loaded at runtime prior to the application starting, the script constructs the path to those libraries dynamically.
How to configure Advanced Environment on Intel Xeon CPUs for optimal performance:
Install two packages that serve as dependencies to use the script
pip install intel-openmp
conda install -y gperftools -c conda-forge
git clone https://github.com/intel/intel-extension-for-pytorch.git
cd intel-extension-for-pytorch
git checkout v2.3.100+cpu
cd examples/cpu/inference/python/llm
Activate environment variables
source ./tools/env_activate.sh
Run a script with the code from the previous section
python run_upscaler_pipeline.py
Your environment is prepared to run Stable Diffusion Upscale Pipeline, which was performance-flagged in the previous stage to optimize for higher performance. Moreover, additional performance can be obtained by inference utilizing the Intel Extension for Pytorch optimized pipeline.