Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is Xformers with ZLUDA possible? #23

Open
unclemusclez opened this issue Jun 19, 2024 · 12 comments
Open

is Xformers with ZLUDA possible? #23

unclemusclez opened this issue Jun 19, 2024 · 12 comments
Labels
question Further information is requested

Comments

@unclemusclez
Copy link

unclemusclez commented Jun 19, 2024

i compiled ZLUDA Finished `release` profile [optimized] target(s) in 5m 40s
i dowloaded nccl from NVIDIA and placed it inside of the ZLUDA directory
P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64

with pytorch-build.bat:

@echo off

set TORCH_CUDA_ARCH_LIST="6.1+PTX"
set CUDAARCHS="61"
set CMAKE_CUDA_ARCHITECTURES="61"
set USE_SYSTEM_NCCL=1
set NCCL_ROOT_DIR="P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64"
set NCCL_INCLUDE_DIR="P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64\include"
set NCCL_LIB_DIR="P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64\lib"
set USE_EXPERIMENTAL_CUDNN_V8_API=1
@echo enviornment set

cargo clean
cargo xtask --release

@pause

is it possible with this configuration to set torch.backends.cudnn.enabled = True ?

the error i get with torch.backends.cudnn.enabled = True. perhaps it is unrelated, but i am just trying to allow for xformers to function.

got prompt
[rgthree] Using rgthree's optimized recursive execution.
[rgthree] First run patching recursive_output_delete_if_changed and recursive_will_execute.
[rgthree] Note: If execution seems broken due to forward ComfyUI changes, you can disable the optimization from rgthree settings in ComfyUI.
model_type FLOW
Using xformers attention in VAE
Using xformers attention in VAE
no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
clip missing: ['text_projection.weight']
Requested to load SD3ClipModel
Loading 1 new model
Requested to load SD3
Loading 1 new model
  0%|                                                                                                                                                                                                                | 0/28 [00:02<?, ?it/s]
!!! Exception during processing!!! CUDA error: named symbol not found
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "P:\ComfyUI-ZLUDA\execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-0246\utils.py", line 381, in new_func
    res_value = old_func(*final_args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\nodes.py", line 1371, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\nodes.py", line 1341, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 22, in informative_sample
    raise e
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 9, in informative_sample
    return original_sample(*args, **kwargs)  # This code helps interpret error messages that occur within exceptions but does not have any impact on other operations.
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\sampling.py", line 313, in motion_sample
    return orig_comfy_sample(model, noise, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\sample.py", line 43, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 794, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 696, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 683, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 662, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 567, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\k_diffusion\sampling.py", line 189, in sample_heun
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 291, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 649, in __call__
    return self.predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 652, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 277, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 226, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\model_base.py", line 113, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 961, in forward
    return super().forward(x, timesteps, context=context, y=y)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 946, in forward
    x = self.forward_core_with_concat(x, c, context)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 909, in forward_core_with_concat
    context, x = block(
                 ^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 635, in forward
    return block_mixing(
           ^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 589, in block_mixing
    return _block_mixing(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 602, in _block_mixing
    attn = optimized_attention(
           ^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 293, in optimized_attention
    return attention.optimized_attention(qkv[0], qkv[1], qkv[2], num_heads)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\attention.py", line 380, in attention_xformers
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=mask)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\__init__.py", line 268, in memory_efficient_attention
    return _memory_efficient_attention(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\__init__.py", line 387, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\__init__.py", line 407, in _memory_efficient_attention_forward
    out, *_ = op.apply(inp, needs_gradient=False)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\cutlass.py", line 202, in apply
    return cls.apply_bmhk(inp, needs_gradient=needs_gradient)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\cutlass.py", line 266, in apply_bmhk
    out, lse, rng_seed, rng_offset, _, _ = cls.OPERATOR(
                                           ^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\_ops.py", line 755, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: named symbol not found
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.```
@lshqqytiger
Copy link
Owner

Do you just need comfyui to work? If so, try WSL with ROCm. It supports Flash Attention 2.
https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-24-10-21-01-WSL-2.html

@unclemusclez
Copy link
Author

Do you just need comfyui to work? If so, try WSL with ROCm. It supports Flash Attention 2. https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-24-10-21-01-WSL-2.html

im trying it now.... when did this come out?

@lshqqytiger
Copy link
Owner

Very recently. Are you on gfx1100? (RX 7900 XT(X), GRE, etc)

@lshqqytiger lshqqytiger added the question Further information is requested label Jun 19, 2024
@unclemusclez
Copy link
Author

Very recently. Are you on gfx1100? (RX 7900 XT(X), GRE, etc)

yes, 7900xt

@unclemusclez
Copy link
Author

So i've been testing the ROCm driver for WSL.

There are sill use-cases for ZLUDA with PyTorch, particularly pertaining to https://github.com/hpcaitech/Open-Sora. seems to need CUDA.

i find ROCm is about 2-3x faster than ZLUDA with Pytorch

@Yasei-no-otoko
Copy link

i find ROCm is about 2-3x faster than ZLUDA with Pytorch

Could you benchmark it with the regulations on this page?
https://www.pugetsystems.com/labs/articles/stable-diffusion-benchmark-testing-methodology/

Old score
https://www.pugetsystems.com/labs/articles/stable-diffusion-performance-nvidia-geforce-vs-amd-radeon/

@unclemusclez
Copy link
Author

i find ROCm is about 2-3x faster than ZLUDA with Pytorch

Could you benchmark it with the regulations on this page? https://www.pugetsystems.com/labs/articles/stable-diffusion-benchmark-testing-methodology/

Old score https://www.pugetsystems.com/labs/articles/stable-diffusion-performance-nvidia-geforce-vs-amd-radeon/

i am trying to compile from source right now and once i get that, i have to make sure my environment is stable. if i accomplish this, i'll benchmark.

@unclemusclez
Copy link
Author

lol...wow, i'm here again. at my own problem that i had. Nice to see you all.

@lshqqytiger , first of all -- you're a god, and we need your help.
We need better Pytorch support. Not that you can make this a priority, but the ROCm support on windows and linux, nor WSL2, is just not there there.

I made the journey, and now I am back here. At the moment, THE BEST COMBINATION I have concluded would be ZLUDA on LINUX or WSL2. Certain packages will not compile on Windows as far as Python goes, and then ROCm is a nightmare no matter what you do.

Particularly, with ROCm, 3D stuff is not available at the moment. This pertains to: https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/cooperative_groups.html

HIP doesn’t support the following NVIDIA CUDA optional headers:

cooperative_groups/memcpy_async.h

cooperative_groups/reduce.h

cooperative_groups/scan.h

projects like https://github.com/graphdeco-inria/gaussian-splatting which are dependent on simple-knn and diff-gaussian-rasterization
https://github.com/graphdeco-inria/gaussian-splatting/tree/main/submodules

I am having the above issue i mentioned months ago, again because i have made the switch back to ZLUDA from ROCm within the past few days (mostly to test the state of the community). I think the issue i'm having currently is trying to run pytorch with

There are no WSL2 supported ROCm torchaudio wheels at the moment.
ROCm is supported on Windows but there is no PyTorch yet. TBH i don't see any dev on this at the moment as far as Gitub goes. WSL2 barely works.. moreso Linux barely works with ROCm 6.2.

for basic image diffusion, everything works fairly well with ROCm
for audio diffusion, Linux ROCm works, Windows has problems with python packages and dependencies
for 3D and Video diffusion, ZLUDA and LINUX

My current issue:

INFO     | 2024-08-27 05:32:06 | autotrain.trainers.common:on_train_begin:230 - Starting to train...
  0%|                                                                                                                                                                                                               | 0/456 [00:00<?, ?it/s]ERROR    | 2024-08-27 05:32:07 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
  File "P:\.pytorchvenv\lib\site-packages\autotrain\trainers\common.py", line 117, in wrapper
    return func(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\autotrain\trainers\sent_transformers\__main__.py", line 213, in train
    trainer.train()
  File "P:\.pytorchvenv\lib\site-packages\transformers\trainer.py", line 1938, in train
    return inner_training_loop(
  File "P:\.pytorchvenv\lib\site-packages\transformers\trainer.py", line 2279, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "P:\.pytorchvenv\lib\site-packages\transformers\trainer.py", line 3318, in training_step
    loss = self.compute_loss(model, inputs)
  File "P:\.pytorchvenv\lib\site-packages\sentence_transformers\trainer.py", line 329, in compute_loss
    loss = loss_fn(features, labels)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\sentence_transformers\losses\CoSENTLoss.py", line 79, in forward
    embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
  File "P:\.pytorchvenv\lib\site-packages\sentence_transformers\losses\CoSENTLoss.py", line 79, in <listcomp>
    embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\accelerate\utils\operations.py", line 819, in forward
    return model_forward(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\accelerate\utils\operations.py", line 807, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "P:\.pytorchvenv\lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\container.py", line 250, in forward
    input = module(input)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\sentence_transformers\models\Transformer.py", line 118, in forward
    output_states = self.auto_model(**trans_features, return_dict=False)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\transformers\models\gemma2\modeling_gemma2.py", line 803, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\modules\sparse.py", line 190, in forward
    return F.embedding(
  File "P:\.pytorchvenv\lib\site-packages\torch\nn\functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: operation not supported
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


ERROR    | 2024-08-27 05:32:07 | autotrain.trainers.common:wrapper:121 - CUDA error: operation not supported
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

i think this is because i have xformers installed. Any recommendations?

@unclemusclez
Copy link
Author

and apparently it does work https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

keyword [ZLUDA]

torch: 2.3.0+cu118 autocast half xformers: diffusers: 0.29.2 transformers: 4.44.0 AMD Radeon RX 7900 XTX [ZLUDA] (2) (compute_37) (8, 8) cuda: 11.8 cudnn: 8700 driver: 24GB

@lshqqytiger
Copy link
Owner

I don't think xformers can work on ZLUDA without rebuilding.
If you really need something like xformers on Windows, try this.

@unclemusclez
Copy link
Author

unclemusclez commented Aug 27, 2024

I don't think xformers can work on ZLUDA without rebuilding. If you really need something like xformers on Windows, try this.

i think it actually does work fine on cu118 whl from pytorch. I believe i was using all the way up to 2.5.0, but CUDNN was causing errors.
I got to cuDNN error: CUDNN_STATUS_INTERNAL_ERROR and i couldnt figure out how to disable CUDNN with autotrain

I don't think xformers can work on ZLUDA without rebuilding. If you really need something like xformers on Windows, try this.

https://github.com/ROCm/flash-attention/tree/howiejay/navi_support was what was recommended to me

@unclemusclez
Copy link
Author

unclemusclez commented Aug 27, 2024

In fact, with xformers on ZLUDA, this was my error:

FATAL: kernel `fmha_cutlassF_f16_aligned_32x128_gmem_sm80` is for sm80-sm100, but was built for sm37

but i think it was working just not for that type of workload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants