-
Notifications
You must be signed in to change notification settings - Fork 5.9k
WAN2.1 apply_group_offloading **ERROR** result #11041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @a-r-r-o-w |
Thanks for the detailed issue!
This warning can be ignored for now. The model weights contain additional The reason for quality degradation seems to be coming from applying group offloading to the text encoder of Wan. Maybe the UMT5EncoderModel implementation has a layer invocation order that is not compatible with streamed group offloading -- I will have to look more deeply. Could you try applying it only to the transformer and reporting your results? I don't fully understand the VAE issue here yet but can take a look soon.
The layer invocation order is automatically detected, so if there are any problems with our transformer implementation, we will have to make improvements to either group offloading code to detect this or rewrite parts of the modeling as necessary. My guess is the text encoder is the reason for poor results, since I was able to run just the transformer with group offloading a few days ago producing good results. |
cc @a-r-r-o-w wanx_diffusers.mp4and my command line output I did not receive warnings that
Perhaps you have updated diffusers In order to run on the 4090, I manually placed image_decoder, text_decoder, and transformer on the CPU during VAE decoding。 |
Thank you for testing! I'm looking into it now, and apologies for the delay/issues faced. |
I have completed testing all the group offload methods. I compared the following combinations:
Out of these, I observed that only 423298751-ea61d0cf-9cbb-477c-8dc0-510ce5e377b4.mp4The other three methods worked correctly on my machine, as shown in the following output: wanx_diffusers.mp4Could you kindly focus on fixing the Additionally, I noticed that when using the diffusers code, the results are significantly lower compared to those from the official Wan2.1 repository. I have set the scheduler by flow_shift = 3.0 # 5.0 for 720P, 3.0 for 480P
scheduler = UniPCMultistepScheduler(prediction_type='flow_prediction', use_flow_sigmas=True, num_train_timesteps=1000, flow_shift=flow_shift)
pipe = WanImageToVideoPipeline.from_pretrained(
model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16, scheduler=scheduler
) After reviewing the code, I suspect that the issue might stem from the model being loaded with the Thank you for your attention! |
Please take a look at #11097 when you get some time. The bug is not reproducible on all GPUs, for all height/width/num_frames, or for all models and only seems to occur in certain cases. This made it extremely hard to debug but I believe it should be fixed now |
Describe the bug
I am attempting to use the WAN 2.1 model from the diffusers library to complete an image-to-video task on an NVIDIA RTX 4090. To optimize memory usage, I chose the group offload method and intended to compare resource consumption across different configurations. However, during testing, I encountered two main issues:
I received warnings that some layers were not executed during the forward pass:
This issue resulted in severe degradation of the generated output.

这是我选择的图像:
我得到了错误的视频:
https://github.com/user-attachments/assets/7a8b55a2-6a71-493a-b7ae-64566b321954
当我使用默认pipe即不采用 group_offload_leaf_stream我得到了正确的结果:
https://github.com/user-attachments/assets/9b54c2f2-fa93-422f-b3df-619ee96bb3c8
2.When using the group_offload_block_1_stream method:
I encountered a runtime error: "RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same". It appears that the VAE module was not correctly assigned to the GPU device.
Request for Help:
Are there recommended approaches to ensure all layers are properly executed, especially for the group_offload_leaf_stream method?
How can I resolve the device mismatch issue related to the VAE?
Any suggestions or guidance would be greatly appreciated!
Reproduction
here is my code
here is my environment
Logs
System Info
Who can help?
@DN6 @a-r-r-o-w
The text was updated successfully, but these errors were encountered: