Today we are releasing a new text-to-image conversion mode, SDXL Turbo. SDXL Turbo is based on a novel distillation technique called Adversarial Diffusion Distillation (ADD), which enables models to synthesize image outputs in a single step, producing real-time text-to-image output while maintaining high sampling fidelity. For researchers and enthusiasts interested in the technical details, our research paper is available. hereIt's important to note that SDXL Turbo isn't intended for commercial use yet.
Advantages of Adversarial Diffusion Distillation
Featuring new advancements in diffusion model technology, SDXL Turbo iterates on the foundations of SDXL 1.0 and implements a new distillation technique for text-to-image models: adversarial diffusion distillation. By incorporating ADD, SDXL Turbo gains many of the benefits it shares with Generative Adversarial Networks (GANs), such as single-step image output, while avoiding the artifacts and blurriness commonly found in other distillation methods. The SDXL Turbo research paper detailing the new distillation technique for this model is available here. here.
Performance advantages over other diffusion models
To make the selection for SDXL Turbo, we compared several different model variants (StyleGAN-T++, OpenMUSE, IF-XL, SDXL, and LCM-XL) by generating outputs with the same prompt. Human raters were then shown two outputs in a random order and instructed to select the output that most closely matched the direction of the prompt. Additional tests of image quality were then completed in the same manner. In these blind tests, SDXL Turbo beat LCM-XL's 4-step configuration by one step, and SDXL's 50-step configuration by only 4 steps. These results show that SDXL Turbo outperforms state-of-the-art multi-step models with significantly lower computational requirements, without sacrificing image quality.