Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: aten::_log_softmax_backward_data with SparseCUDA backend #36674

Open
2 of 4 tasks
rangehow opened this issue Mar 12, 2025 · 4 comments
Open
2 of 4 tasks
Labels

Comments

@rangehow
Copy link

System Info

  • transformers version: 4.49.0
  • Platform: Linux-4.18.0-147.mt20200626.413.el8_1.x86_64-x86_64-with-glibc2.17
  • Python version: 3.12.3
  • Huggingface_hub version: 0.26.3
  • Safetensors version: 0.4.5
  • Accelerate version: 1.4.0
  • Accelerate config: not found
  • DeepSpeed version: 0.15.4
  • PyTorch version (GPU?): 2.5.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA Graphics Device

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

labels is a sparse coo tensor

class NDPTrainer(Trainer):


    def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        input_ids = inputs.pop("input_ids")
        attention_mask = inputs.pop("attention_mask")
        cnt_list = inputs.pop(
            "cnt_list"
        )  
        
        labels= inputs.pop("label")
        
                
        result = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
        )

        model_logits = result.logits  # bsz x seqlen x dim
        
        ce_loss = CrossEntropyLoss(ignore_index=-100)
        
 
        loss = ce_loss(model_logits, labels)
            
        if return_outputs:
            return loss, {"logits": model_logits}
        else:
            return loss
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/ruanjunhao/ndp/new_version/train.py", line 107, in <module>
    trainer.train()
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/transformers/trainer.py", line 2241, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/transformers/trainer.py", line 3740, in training_step
    self.accelerator.backward(loss, **kwargs)
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/accelerate/accelerator.py", line 2329, in backward
    loss.backward(**kwargs)
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/torch/_tensor.py", line 581, in backward
    torch.autograd.backward(
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Could not run 'aten::_log_softmax_backward_data' with arguments from the 'SparseCUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_log_softmax_backward_data' is only available for these backends: [CPU, CUDA, HIP, MPS, IPU, XPU, HPU, VE, MTIA, PrivateUse1, PrivateUse2, PrivateUse3, Meta, FPGA, MAIA, Vulkan, Metal, QuantizedCPU, QuantizedCUDA, QuantizedHIP, QuantizedMPS, QuantizedIPU, QuantizedXPU, QuantizedHPU, QuantizedVE, QuantizedMTIA, QuantizedPrivateUse1, QuantizedPrivateUse2, QuantizedPrivateUse3, QuantizedMeta, CustomRNGKeyId, MkldnnCPU, SparseCsrCPU, SparseCsrCUDA, SparseCsrHIP, SparseCsrMPS, SparseCsrIPU, SparseCsrXPU, SparseCsrHPU, SparseCsrVE, SparseCsrMTIA, SparseCsrPrivateUse1, SparseCsrPrivateUse2, SparseCsrPrivateUse3, SparseCsrMeta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

However, PyTorch seems to support backpropagation of sparse tensors, such as:

import torch
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss


indices = torch.tensor([[0, 1, 2], [0, 1, 2]])  
values = torch.tensor([1.0, 2.0, 3.0]) 
size = torch.Size([3, 3])  
sparse_tensor = torch.sparse_coo_tensor(indices, values, size)


target = torch.tensor([0, 1, 2]) 


logits = torch.randn((3,3),requires_grad=True)


loss_func = CrossEntropyLoss(reduction='sum')
loss = loss_func(logits, target) 
loss.backward()

print(loss)

Expected behavior

If I set a breakpoint using pdb in the trainer, calling loss.backward() works fine. It seems that something in the trainer is causing an issue and breaking everything.

@rangehow rangehow added the bug label Mar 12, 2025
@Rocketknight1
Copy link
Member

cc @SunMarc @muellerzr

@SunMarc
Copy link
Member

SunMarc commented Mar 13, 2025

Hey @rangehow , it is not supported for log_softmax function I think from the traceback

@rangehow
Copy link
Author

Hey @rangehow , it is not supported for log_softmax function I think from the traceback

Thanks for your reply. I have provide a code snippet(the second) that shows pytorch support bp for sparse tensor+ce loss.

@SunMarc
Copy link
Member

SunMarc commented Mar 13, 2025

You are not using sparse tensor at all in our second script when computing the loss. Also make sure to move the tensors to the right device

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants