"element 0 of tensors does not require grad and does not have a grad_fn" when using AdamW from Hugging Face #18254

0x404 · 2023-08-08T07:23:22Z

Bug description

When attempting to migrate my current model to Lightning, I encountered an error while using the AdamW optimizer provided by Hugging Face's Transformers library during training: "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn."

What version are you seeing the problem on?

v2.0

How to reproduce the bug

import torch
from torch.utils.data import DataLoader, Dataset
from transformers import AdamW
from lightning.pytorch import LightningModule, Trainer


class RandomDataset(Dataset):
    def __init__(self, length):
        self.len = length
        self.data = torch.randn(length, 31)
        self.label = torch.zeros(length, dtype=torch.long)

    def __getitem__(self, index):
        return self.data[index], self.label[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(31, 2)
        self.loss_fn = torch.nn.CrossEntropyLoss()

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        input = batch[0]
        label = batch[1]
        out = self.layer(input)
        loss = self.loss_fn(out, label)
        return loss

    def configure_optimizers(self):
        return AdamW(self.layer.parameters(), lr=0.1)


def run():
    train_data = DataLoader(RandomDataset(32), batch_size=2)
    model = BoringModel()
    trainer = Trainer(max_epochs=1)
    trainer.fit(model, train_dataloaders=train_data)


if __name__ == "__main__":
    run()

Error messages and logs

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Environment

lightning==2.0.6
transformers==4.31.0

More info

Upon further analysis of Lightning's source code, I found that the issue stemmed from the use of the @torch.no_grad() decorator in the step function of AdamW provided by Hugging Face's Transformers library.

Source code of AdamW provided by Hugging Face's Transformers library:

@torch.no_grad()
    def step(self, closure: Callable = None):
        """
        Performs a single optimization step.

        Arguments:
            closure (`Callable`, *optional*): A closure that reevaluates the model and returns the loss.
        """
        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:
            for p in group["params"]:
                if p.grad is None:
                    continue
                grad = p.grad
                if grad.is_sparse:
                    raise RuntimeError("Adam does not support sparse gradients, please consider SparseAdam instead")

                state = self.state[p]

...

Lightning wraps the training_step within a closure, and the actual execution of training_step occurs during optimizer.step. This leads to the fact that the step function in the Transformers library's AdamW runs model.training_step and calculates the step loss, causing loss.requires_grad to be False.

To address this problem, I made the following temporary modifications to precision_plugin.py:

def _temp_fix(self, closure, model, optimizer):
    closure_result = closure()
    self._after_closure(model, optimizer)
    def wrap_closure():
        return closure_result
    return wrap_closure

def optimizer_step(  # type: ignore[override]
    self,
    optimizer: Steppable,
    model: "pl.LightningModule",
    closure: Callable[[], Any],
    **kwargs: Any,
) -> Any:
    """Hook to run the optimizer step."""
    # closure = partial(self._wrap_closure, model, optimizer, closure)
    closure = self._temp_fix(closure, model, optimizer)
    return optimizer.step(closure=closure, **kwargs)

This modification ensures that the internal training_step and related functions are executed before passing the actual closure to the optimizer. The closure result is then wrapped in a simple callable for the optimizer, allowing the optimizer to access closure_result.

I'm not certain if this is a correct fix. I am new to the Lightning community and find Lightning very convenient for me. I would like to contribute to Lightning. Would it be possible for me to receive guidance and open a pull request to fix this bug?

thanks in advance!

cc @Borda

The text was updated successfully, but these errors were encountered:

0x404 · 2023-08-08T07:26:30Z

maybe related to #18222

awaelchli · 2023-08-08T11:41:36Z

Thank you so much for the analysis @0x404, this is very helpful and saves us a lot of time.

Your workaround is also good and works in your case, but doesn't support optimizers that require closures. And since it is a feature in Lightning, we can't just remove it.

Here is another proposal that might work. We could explicitly set grad enabled when entering our closure, bypassing the no_grad() context set in the optimizer (only when closure runs training_step of course), and the rest of the HF optimizer code can still benefit from the no_grad context. This makes sense to me now, but needs to be validated.

0x404 · 2023-08-08T12:12:31Z

Thank you @awaelchli , I validated this method and make this modification in precision_plugin.py:

def _wrap_closure(
    self,
    model: "pl.LightningModule",
    optimizer: Optimizer,
    closure: Callable[[], Any],
) -> Any:
    """This double-closure allows makes sure the ``closure`` is executed before the
    ``on_before_optimizer_step`` hook is called.

    The closure (generally) runs ``backward`` so this allows inspecting gradients in this hook. This structure is
    consistent with the ``PrecisionPlugin`` subclasses that cannot pass ``optimizer.step(closure)`` directly.
    """
    def _torch_require_grad():
        _x = torch.tensor([0.0], requires_grad=True)
        _y = _x ** 2
        return _y.requires_grad
    
    _require_grad = _torch_require_grad()
    torch.set_grad_enabled(True)

    closure_result = closure()
    self._after_closure(model, optimizer)

    if not _require_grad:
        torch.set_grad_enabled(False)
    return closure_result

def optimizer_step(  # type: ignore[override]
    self,
    optimizer: Steppable,
    model: "pl.LightningModule",
    closure: Callable[[], Any],
    **kwargs: Any,
) -> Any:
    """Hook to run the optimizer step."""
    closure = partial(self._wrap_closure, model, optimizer, closure)
    return optimizer.step(closure=closure, **kwargs)

This seems work fine.

0x404 · 2023-08-08T14:35:29Z

explicitly setting grad enabled for training_step will lead to the same error "element 0 of tensors does not require grad and does not have a grad_fn" since backward needs grad. Therefore, I explicitly setting grad enabled for closure, which should consists of training_step, backward and optimizer_zero_grad.

awaelchli · 2023-08-08T14:43:14Z

@0x404 I was thinking we should do it here, not in the precision plugin:
https://github.com/Lightning-AI/lightning/blob/a0f46abc71fe7ee61883f6e86dd8e2929341b050/src/lightning/pytorch/loops/optimization/automatic.py#L123-L135

awaelchli · 2023-08-08T14:44:22Z

Basically, at the beginning of this closure, set grad enabled. What do you think?

0x404 · 2023-08-08T15:03:00Z

@awaelchli Exactly! Would it be possible for me to submit a PR to address this? I'm relatively new to Lightning, so it might take me a day or two to become acquainted with the Lightning workflow. I am so interested in Lighting, so I think this is a good first issue for me.

awaelchli · 2023-08-08T15:15:05Z

Definitely please give it a try. This is much appreciated, and I'm happy to help or answer questions.
We still need to validate that such a change doesn't have any unintended side effects, so I suggest to submit a PR with the change and then we can let the test suite run.

0x404 · 2023-08-09T05:21:49Z

Hi @awaelchli, I encountered a few issues while attempting this:

I made modifications to the corresponding code in src/pytorch_lightning/, but it seems this target is being ignored by git. How can I submit my code changes to the git repository?
After completing my code changes, how should I build Lightning for testing purposes? (Currently, I'm using make test, but the generated package doesn't seem to include my modifications.)

Could you please point me to any relevant documentation that could assist me? Thank you.

awaelchli · 2023-08-09T11:23:40Z

You need to make the modifications under src/lightning/pytorch. Ignore the src/pytorch_lightning/, it is only there to generate the pytorch_lightning package.

After completing my code changes, how should I build Lightning for testing purposes? (Currently, I'm using make test, but the generated package doesn't seem to include my modifications.)

You can, but there are many tests and you won't be able to run all of them (which is what make test will attempt to do). For a simple change like yours, I suggest to just pip install -r requirements/pytorch/test.txt, then run individual test files like so:

py.test -v tests/tests_pytorch/.../path/to/test_file.py

But before doing that, if I were you I would just submit the PR first, then the CI can run once through the test suite and we see the output. If some tests fail, then that's the time to go and investigate. Let me know if that works.

0x404 · 2023-08-10T02:06:44Z

Thanks @awaelchli, I have already submit a PR #18268.

carmocca · 2023-08-10T04:19:34Z

Shouldnt this be fixed in the optimizer definition? As it forgot to take the closure into consideration

0x404 · 2023-08-10T04:48:41Z

HI, @carmocca. I feel the same way. But I think checking for gradients for closure in lighting is also a viable approach. This makes Lightning more compatible with various third-party libraries (including those that forgot to take the closure into consideration). What do you think?

carmocca · 2023-08-10T04:57:51Z

I agree. It's okay to add the explicit fix here too

0x404 added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Aug 8, 2023

github-actions bot added the ver: 2.0.x label Aug 8, 2023

awaelchli added optimization 3rd party Related to a 3rd-party and removed needs triage Waiting to be triaged by maintainers labels Aug 8, 2023

awaelchli self-assigned this Aug 8, 2023

awaelchli added this to the 2.0.x milestone Aug 8, 2023

awaelchli assigned 0x404 Aug 8, 2023

awaelchli mentioned this issue Aug 8, 2023

“element 0 of tensors does not require grad and does not have a grad_fn” #18222

Closed

0x404 mentioned this issue Aug 10, 2023

Explicitly enable grad in closure #18268

Merged

7 tasks

awaelchli closed this as completed in #18268 Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"element 0 of tensors does not require grad and does not have a grad_fn" when using AdamW from Hugging Face #18254

"element 0 of tensors does not require grad and does not have a grad_fn" when using AdamW from Hugging Face #18254

0x404 commented Aug 8, 2023 •

edited by github-actions bot

Loading

0x404 commented Aug 8, 2023

awaelchli commented Aug 8, 2023

0x404 commented Aug 8, 2023

0x404 commented Aug 8, 2023

awaelchli commented Aug 8, 2023

awaelchli commented Aug 8, 2023

0x404 commented Aug 8, 2023

awaelchli commented Aug 8, 2023

0x404 commented Aug 9, 2023

awaelchli commented Aug 9, 2023

0x404 commented Aug 10, 2023

carmocca commented Aug 10, 2023

0x404 commented Aug 10, 2023

carmocca commented Aug 10, 2023

"element 0 of tensors does not require grad and does not have a grad_fn" when using AdamW from Hugging Face #18254

"element 0 of tensors does not require grad and does not have a grad_fn" when using AdamW from Hugging Face #18254

Comments

0x404 commented Aug 8, 2023 • edited by github-actions bot Loading

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

0x404 commented Aug 8, 2023

awaelchli commented Aug 8, 2023

0x404 commented Aug 8, 2023

0x404 commented Aug 8, 2023

awaelchli commented Aug 8, 2023

awaelchli commented Aug 8, 2023

0x404 commented Aug 8, 2023

awaelchli commented Aug 8, 2023

0x404 commented Aug 9, 2023

awaelchli commented Aug 9, 2023

0x404 commented Aug 10, 2023

carmocca commented Aug 10, 2023

0x404 commented Aug 10, 2023

carmocca commented Aug 10, 2023

0x404 commented Aug 8, 2023 •

edited by github-actions bot

Loading