NotImplementedError: aten::_log_softmax_backward_data with SparseCUDA backend #36674

rangehow · 2025-03-12T14:59:33Z

System Info

transformers version: 4.49.0
Platform: Linux-4.18.0-147.mt20200626.413.el8_1.x86_64-x86_64-with-glibc2.17
Python version: 3.12.3
Huggingface_hub version: 0.26.3
Safetensors version: 0.4.5
Accelerate version: 1.4.0
Accelerate config: not found
DeepSpeed version: 0.15.4
PyTorch version (GPU?): 2.5.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA Graphics Device

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

labels is a sparse coo tensor

class NDPTrainer(Trainer):


    def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        input_ids = inputs.pop("input_ids")
        attention_mask = inputs.pop("attention_mask")
        cnt_list = inputs.pop(
            "cnt_list"
        )  
        
        labels= inputs.pop("label")
        
                
        result = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
        )

        model_logits = result.logits  # bsz x seqlen x dim
        
        ce_loss = CrossEntropyLoss(ignore_index=-100)
        
 
        loss = ce_loss(model_logits, labels)
            
        if return_outputs:
            return loss, {"logits": model_logits}
        else:
            return loss

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/ruanjunhao/ndp/new_version/train.py", line 107, in <module>
    trainer.train()
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/transformers/trainer.py", line 2241, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/transformers/trainer.py", line 3740, in training_step
    self.accelerator.backward(loss, **kwargs)
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/accelerate/accelerator.py", line 2329, in backward
    loss.backward(**kwargs)
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/torch/_tensor.py", line 581, in backward
    torch.autograd.backward(
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/INS/ruanjunhao04/env/rjh/lib/python3.12/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Could not run 'aten::_log_softmax_backward_data' with arguments from the 'SparseCUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_log_softmax_backward_data' is only available for these backends: [CPU, CUDA, HIP, MPS, IPU, XPU, HPU, VE, MTIA, PrivateUse1, PrivateUse2, PrivateUse3, Meta, FPGA, MAIA, Vulkan, Metal, QuantizedCPU, QuantizedCUDA, QuantizedHIP, QuantizedMPS, QuantizedIPU, QuantizedXPU, QuantizedHPU, QuantizedVE, QuantizedMTIA, QuantizedPrivateUse1, QuantizedPrivateUse2, QuantizedPrivateUse3, QuantizedMeta, CustomRNGKeyId, MkldnnCPU, SparseCsrCPU, SparseCsrCUDA, SparseCsrHIP, SparseCsrMPS, SparseCsrIPU, SparseCsrXPU, SparseCsrHPU, SparseCsrVE, SparseCsrMTIA, SparseCsrPrivateUse1, SparseCsrPrivateUse2, SparseCsrPrivateUse3, SparseCsrMeta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

However, PyTorch seems to support backpropagation of sparse tensors, such as:

import torch
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss


indices = torch.tensor([[0, 1, 2], [0, 1, 2]])  
values = torch.tensor([1.0, 2.0, 3.0]) 
size = torch.Size([3, 3])  
sparse_tensor = torch.sparse_coo_tensor(indices, values, size)


target = torch.tensor([0, 1, 2]) 


logits = torch.randn((3,3),requires_grad=True)


loss_func = CrossEntropyLoss(reduction='sum')
loss = loss_func(logits, target) 
loss.backward()

print(loss)

Expected behavior

If I set a breakpoint using pdb in the trainer, calling loss.backward() works fine. It seems that something in the trainer is causing an issue and breaking everything.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-03-13T14:33:11Z

cc @SunMarc @muellerzr

SunMarc · 2025-03-13T15:41:23Z

Hey @rangehow , it is not supported for log_softmax function I think from the traceback

rangehow · 2025-03-13T15:58:16Z

Hey @rangehow , it is not supported for log_softmax function I think from the traceback

Thanks for your reply. I have provide a code snippet(the second) that shows pytorch support bp for sparse tensor+ce loss.

SunMarc · 2025-03-13T16:14:19Z

You are not using sparse tensor at all in our second script when computing the loss. Also make sure to move the tensors to the right device

rangehow added the bug label Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NotImplementedError: aten::_log_softmax_backward_data with SparseCUDA backend #36674

NotImplementedError: aten::_log_softmax_backward_data with SparseCUDA backend #36674

rangehow commented Mar 12, 2025

Rocketknight1 commented Mar 13, 2025

SunMarc commented Mar 13, 2025

rangehow commented Mar 13, 2025

SunMarc commented Mar 13, 2025

NotImplementedError: aten::_log_softmax_backward_data with SparseCUDA backend #36674

NotImplementedError: aten::_log_softmax_backward_data with SparseCUDA backend #36674

Comments

rangehow commented Mar 12, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Mar 13, 2025

SunMarc commented Mar 13, 2025

rangehow commented Mar 13, 2025

SunMarc commented Mar 13, 2025