P2Enjoy · martinobettucci · Jul 6, 2023 · Jul 6, 2023 · Jul 6, 2023 · Jul 7, 2023
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 ---
-title: Stable Diffusion XL 0.9
+title: Stable Diffusion XL 1.0
 emoji: 🔥
 colorFrom: yellow
 colorTo: gray
@@ -10,74 +10,89 @@ pinned: true
 license: mit
 ---
 
-# StableDiffusion XL Gradio Demo
-This is a gradio demo supporting [Stable Diffusion XL 0.9](https://github.com/Stability-AI/generative-models). This demo loads the base and the refiner model.
+# StableDiffusion XL Gradio Demo WebUI
+This is a gradio demo with web ui supporting [Stable Diffusion XL 1.0](https://github.com/Stability-AI/generative-models). This demo loads the base and the refiner model.
 
-This is forked from [StableDiffusion v2.1 Demo](https://huggingface.co/spaces/gradio-client-demos/stable-diffusion). Refer to the git commits to see the changes.
+This is forked from [StableDiffusion v2.1 Demo WebUI](https://huggingface.co/spaces/gradio-client-demos/stable-diffusion). Refer to the git commits to see the changes.
 
-**Update:** Colab is supported! You can run this demo on Colab for free even on T4. <a target="_blank" href="https://colab.research.google.com/github/TonyLianLong/stable-diffusion-xl-demo/blob/main/Stable_Diffusion_XL_Demo.ipynb">
+**Update 🔥🔥🔥: [Latent consistency models (LCM) LoRA](https://huggingface.co/blog/lcm_lora) is supported and enabled by default (controlled by `ENABLE_LCM`)! Turn on `USE_SSD` to use `SSD-1B` for a even faster generation (4.9 sec/image on free colab T4 without additional optimizations)!** Colab has been updated to use this by default. <a target="_blank" href="https://colab.research.google.com/github/TonyLianLong/stable-diffusion-xl-demo/blob/main/Stable_Diffusion_XL_Demo.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
 
+**Update 🔥🔥🔥:** Check out our work <a href='https://llm-grounded-diffusion.github.io/'>**LLM-grounded Diffusion (LMD)**</a>, which introduces LLMs into the diffusion world and achieves much better prompt understanding compared to the standard Stable Diffusion without any fine-tuning! LMD with SDXL is supported on <a href='https://github.com/TonyLianLong/LLM-groundedDiffusion'>our Github repo</a> and <a href='https://huggingface.co/spaces/longlian/llm-grounded-diffusion'>a demo with SD is available</a>.
+
+**Update:** [SDXL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is released and our Web UI demo supports it! No application is needed to get the weights! Launch the colab to get started. You can run this demo on Colab for free even on T4. <a target="_blank" href="https://colab.research.google.com/github/TonyLianLong/stable-diffusion-xl-demo/blob/main/Stable_Diffusion_XL_Demo.ipynb">
+  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
+</a>
+
+**Update:** Multiple GPUs are supported. You can easily spread the workload to different GPUs by setting `MULTI_GPU=True`. This uses data parallelism to split the workload to different GPUs.
+
+<img src="imgs/sdxl_ssd_lcm.gif" width="48%" alt="SDXL with SSD-1B, LCM LoRA">
+
 ## Examples
-Left: SDXL 0.9. Right: [SD v2.1](https://huggingface.co/spaces/gradio-client-demos/stable-diffusion).
+
+**Update:** [See a more comprehensive comparison with 1200+ images here](https://github.com/TonyLianLong/stable-diffusion-xl-demo/tree/benchmark/benchmark). Both SD XL and SD v2.1 are benchmarked on prompts from [StableStudio](https://github.com/Stability-AI/StableStudio).
+
+Left: SDXL. Right: [SD v2.1](https://huggingface.co/spaces/gradio-client-demos/stable-diffusion).
 
 Without any tuning, SDXL generates much better images compared to SD v2.1!
 
 ### Example 1
 <p align="middle">
-<img src="imgs/img1_sdxl0.9.png" width="48%">
+<img src="imgs/img1_sdxl1.0.png" width="48%">
 <img src="imgs/img1_sdv2.1.png" width="48%">
 </p>
 
 ### Example 2
 <p align="middle">
-<img src="imgs/img2_sdxl0.9.png" width="48%">
+<img src="imgs/img2_sdxl1.0.png" width="48%">
 <img src="imgs/img2_sdv2.1.png" width="48%">
 </p>
 
 ### Example 3
 <p align="middle">
-<img src="imgs/img3_sdxl0.9.png" width="48%">
+<img src="imgs/img3_sdxl1.0.png" width="48%">
 <img src="imgs/img3_sdv2.1.png" width="48%">
 </p>
 
 ### Example 4
 <p align="middle">
-<img src="imgs/img4_sdxl0.9.png" width="48%">
+<img src="imgs/img4_sdxl1.0.png" width="48%">
 <img src="imgs/img4_sdv2.1.png" width="48%">
 </p>
 
 ### Example 5
 <p align="middle">
-<img src="imgs/img5_sdxl0.9.png" width="48%">
+<img src="imgs/img5_sdxl1.0.png" width="48%">
 <img src="imgs/img5_sdv2.1.png" width="48%">
 </p>
 
 ## Installation
 With torch 2.0.1 installed, we also need to install:
 ```shell
 pip install accelerate transformers invisible-watermark "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25" safetensors "gradio==3.11.0"
-pip install git+https://github.com/huggingface/diffusers.git@sd_xl
+pip install git+https://github.com/huggingface/diffusers.git
 ```
 
 ## Launching
-It's free but you need to [submit a quick form](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) to get access to the weights.
+It's free and *no form is needed* now. Leaked weights seem to be available on [reddit](https://www.reddit.com/r/StableDiffusion/comments/14s04t1/happy_sdxl_leak_day/), but I have not used/tested them.
 
-There are two ways to load the weights. After getting access to weights, you can either clone them locally or this repo can load them for you.
+There are two ways to load the weights. Option 1 works out of the box (no need for manual download). If you prefer loading from local repo, you can use Option 2.
 
 ### Option 1
-If you have cloned both repo ([base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9), [refiner](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9)) locally (please change the `path_to_sdxl`):
+Run the command to automatically set up the weights:
 ```
-PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 SDXL_MODEL_DIR=/path_to_sdxl python app.py
+PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 python app.py
 ```
 
-### Option 2
-If you want to load from the huggingface hub (please set up a [HuggingFace access token](https://huggingface.co/docs/hub/security-tokens)):
+### Option 1
+If you have cloned both repo ([base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [refiner](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)) locally (please change the `path_to_sdxl`):
 ```
-PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 ACCESS_TOKEN=YOUR_HF_ACCESS_TOKEN python app.py
+PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 SDXL_MODEL_DIR=/path_to_sdxl python app.py
 ```
 
+Note that `stable-diffusion-xl-base-1.0` and `stable-diffusion-xl-refiner-1.0` should be placed in a directory. The path of the directory should replace `/path_to_sdxl`.
+
 ### `torch.compile` support
 Turn on `torch.compile` will make overall inference faster. However, this will add some overhead to the first run (i.e., have to wait for compilation during the first run).
 
@@ -87,9 +102,13 @@ Turn on `torch.compile` will make overall inference faster. However, this will a
 3. More ways to [save memory and make things faster](https://huggingface.co/docs/diffusers/optimization/fp16).
 
 ### Several options through environment variables
-* `SDXL_MODEL_DIR` and `ACCESS_TOKEN`: load SDXL locally or from HF hub.
-* `ENABLE_REFINER=true/false` turn on/off the refiner ([refiner](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9) refines the generation).
+* `USE_SSD`: use [segmind/SSD-1B](https://huggingface.co/segmind/SSD-1B). This is a distilled SDXL model that is faster. This is disabled by default.
+* `ENABLE_LCM`: use [LCM LoRA](https://huggingface.co/blog/lcm_lora). This is enabled by default.
+* `SDXL_MODEL_DIR`: load SDXL locally.
+* `ENABLE_REFINER=true/false` turn on/off the refiner ([refiner](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) refines the generation). The refiner is disabled by default if LCM LoRA or SSD model is enabled.
+* `OFFLOAD_BASE` and `OFFLOAD_REFINER` can be set to true/false to enable/disable model offloading (model offloading saves memory at the cost of slowing down generation).
 * `OUTPUT_IMAGES_BEFORE_REFINER=true/false` useful is refiner is enabled. Output images before and after the refiner stage.
 * `SHARE=true/false` creates public link (useful for sharing and on colab)
+* `MULTI_GPU=true/false` enables data parallelism on multi gpus.
 
 ## If you enjoy this demo, please give [this repo](https://github.com/TonyLianLong/stable-diffusion-xl-demo) a star ⭐.