Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sd2 switching #16079

Merged
merged 1 commit into from
Jul 6, 2024
Merged

Conversation

light-and-ray
Copy link
Contributor

@light-and-ray light-and-ray commented Jun 23, 2024

Closes: #13763
I start webui with not sd 2.1 model, then I try to load sd 2.1 checkpoint, and I get "NotImplementedError"
If I've started webui with sd2 model in --ckpt flag, it works, even if I switch to other and back to sd2

This bug appeared in October, now I think it's connected with device = devices.cpu in is_using_v_parameterization_for_sd2 inside this commit d04e3e9#diff-b710a9b8e9fbcc5bc5a014f938c9c74564c1dcfc86929f0dc9ff643ba3fe7873R30

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 64, 5, 64) (torch.float32)
key : shape=(1, 64, 5, 64) (torch.float32)
value : shape=(1, 64, 5, 64) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0

decoderF is not supported because:
device=cpu (supported: {'cuda'})
attn_bias type is <class 'NoneType'>

flshattF@v2.3.6 is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})

tritonflashattF is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
operator wasn't built - see python -m xformers.info for more info
triton is not available
Only work on pre-MLIR triton for now

cutlassF is not supported because:
device=cpu (supported: {'cuda'})

smallkF is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
device=cpu (supported: {'cuda'})
unsupported embed per head: 64

After this patch the bug is gone for me

Checklist:

@light-and-ray light-and-ray mentioned this pull request Jun 23, 2024
4 tasks
@AUTOMATIC1111
Copy link
Owner

The reason I didn't want to run the calculation on GPU, is that it takes substantially longer to transfer the model to GPU than to just do one inference on CPU. I merged the other one.

@light-and-ray
Copy link
Contributor Author

I merged the other one.

Which? The hcl's solves the other issue

@light-and-ray
Copy link
Contributor Author

The reason I didn't want to run the calculation on GPU

But it doesn't work, at least not for everyone. Read the description

@AUTOMATIC1111 AUTOMATIC1111 reopened this Jul 6, 2024
@AUTOMATIC1111 AUTOMATIC1111 merged commit 477869c into AUTOMATIC1111:dev Jul 6, 2024
6 checks passed
@lawchingman lawchingman mentioned this pull request Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants