高质量快速对抗蒸馏扩散模型_开源AI项目-程序员客栈

SD-Turbo是一种快速生成的文本到图像模型，可以在单个网络评估中从文本提示合成逼真的图像。本文发布了SD-Turbo作为研究artifact，并研究小型的、蒸馏的文本到图像模型.

SD-Turbo是一个蒸馏版本的稳定扩散2.1，训练实时合成。SD-Turbo基于一种称为对抗扩散蒸馏(Adversarial Diffusion Distillation, ADD)的新型训练方法(见技术报告)，该方法允许在1到4步内以高图像质量采样大规模基础图像扩散模型。该方法使用分数蒸馏来利用大规模现成的图像扩散模型作为教师信号，并将其与对抗损失相结合，以确保即使在一个或两个采样步骤的低步范围内也能获得高图像保真度。

Model Sources

For research purposes, we recommend our generative-models Github repository (https://github.com/Stability-AI/generative-models), which implements the most popular diffusion frameworks (both training and inference).

Repository: https://github.com/Stability-AI/generative-models
Paper: https://stability.ai/research/adversarial-diffusion-distillation
Demo [for the bigger SDXL-Turbo]: http://clipdrop.co/stable-diffusion-turbo

Evaluation

The charts above evaluate user preference for SD-Turbo over other single- and multi-step models. SD-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-Lora XL and LCM-Lora 1.5.

Note: For increased quality, we recommend the bigger version SDXL-Turbo. For details on the user study, we refer to the research paper.

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include

Research on generative models.
Research on real-time applications of generative models.
Research on the impact of real-time generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.
Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.

Excluded uses are described below.

Diffusers

pip install diffusers transformers accelerate --upgrade

Text-to-image:

SD-Turbo does not make use of guidance_scale or negative_prompt, we disable it with guidance_scale=0.0. Preferably, the model generates images of size 512x512 but higher image sizes work as well. A single step is enough to generate high quality images.

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."
image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

Image-to-image:

When using SD-Turbo for image-to-image generation, make sure that num_inference_steps * strength is larger or equal to 1. The image-to-image pipeline will run for int(num_inference_steps * strength) steps, e.g. 0.5 * 2.0 = 1 step in our example below.

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]

Limitations and Bias

Limitations

The quality and prompt alignment is lower than that of SDXL-Turbo.
The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism.
The model cannot render legible text.
Faces and people in general may not be generated properly.
The autoencoding part of the model is lossy.

Clone with HTTP

 git clone https://www.modelscope.cn/tany0699/cv_diffusion_text-to-image_sd-turbo.git

 @article{Sauer_Lorenz_Blattmann_Stability,  
 title={Adversarial Diffusion Distillation}, 
 author={Sauer, Axel and Lorenz, Dominik and Blattmann, Andreas and Stability, RobinRombach}, 
 language={en-US} 
 }

高质量快速对抗蒸馏扩散模型

作品详情