-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
If you try to use ZImagePipeline with batch sizes above 1, it fails with an Assertion error
### Reproduction
import torch
from diffusers import ZImagePipeline
# 1. Load the pipeline
# Use bfloat16 for optimal performance on supported GPUs
pipe = ZImagePipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=False,
)
pipe.to("cuda")
# [Optional] Attention Backend
# Diffusers uses SDPA by default. Switch to Flash Attention for better efficiency if supported:
# pipe.transformer.set_attention_backend("flash") # Enable Flash-Attention-2
# pipe.transformer.set_attention_backend("_flash_3") # Enable Flash-Attention-3
# [Optional] Model Compilation
# Compiling the DiT model accelerates inference, but the first run will take longer to compile.
# pipe.transformer.compile()
# [Optional] CPU Offloading
# Enable CPU offloading for memory-constrained devices.
pipe.enable_model_cpu_offload()
prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."
# 2. Generate Image
image = pipe(
prompt=prompt,
height=1024,
width=1024,
num_inference_steps=9, # This actually results in 8 DiT forwards
guidance_scale=0.0, # Guidance should be 0 for the Turbo models
generator=torch.Generator("cuda").manual_seed(42),
num_images_per_prompt=2
).images[0]
image.save("example.png")
Logs
Traceback (most recent call last):
File "/home/meatfucker/ml/avernus/zimage_test.py", line 29, in <module>
image = pipe(
~~~~^
prompt=prompt,
^^^^^^^^^^^^^^
...<5 lines>...
num_images_per_prompt=2
^^^^^^^^^^^^^^^^^^^^^^^
).images[0]
^
File "/home/meatfucker/ml/avernus/venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/meatfucker/ml/avernus/venv/lib/python3.13/site-packages/diffusers/pipelines/z_image/pipeline_z_image.py", line 452, in __call__
) = self.encode_prompt(
~~~~~~~~~~~~~~~~~~^
prompt=prompt,
^^^^^^^^^^^^^^
...<8 lines>...
lora_scale=lora_scale,
^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/meatfucker/ml/avernus/venv/lib/python3.13/site-packages/diffusers/pipelines/z_image/pipeline_z_image.py", line 178, in encode_prompt
prompt_embeds = self._encode_prompt(
prompt=prompt,
...<4 lines>...
max_sequence_length=max_sequence_length,
)
File "/home/meatfucker/ml/avernus/venv/lib/python3.13/site-packages/diffusers/pipelines/z_image/pipeline_z_image.py", line 214, in _encode_prompt
assert num_images_per_prompt == 1
^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionErrorSystem Info
- 🤗 Diffusers version: 0.36.0.dev0
- Platform: Linux-6.8.0-88-generic-x86_64-with-glibc2.39
- Running on Google Colab?: No
- Python version: 3.13.5
- PyTorch version (GPU?): 2.8.0+cu128 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 1.0.0.rc5
- Transformers version: 4.57.0.dev0
- Accelerate version: 1.10.0
- PEFT version: 0.17.1
- Bitsandbytes version: 0.47.0
- Safetensors version: 0.6.2
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 3090, 24576 MiB
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working