-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"element 0 of tensors does not require grad and does not have a grad_fn" when using AdamW from Hugging Face #18254
Comments
maybe related to #18222 |
Thank you so much for the analysis @0x404, this is very helpful and saves us a lot of time. Your workaround is also good and works in your case, but doesn't support optimizers that require closures. And since it is a feature in Lightning, we can't just remove it. Here is another proposal that might work. We could explicitly set grad enabled when entering our closure, bypassing the |
Thank you @awaelchli , I validated this method and make this modification in def _wrap_closure(
self,
model: "pl.LightningModule",
optimizer: Optimizer,
closure: Callable[[], Any],
) -> Any:
"""This double-closure allows makes sure the ``closure`` is executed before the
``on_before_optimizer_step`` hook is called.
The closure (generally) runs ``backward`` so this allows inspecting gradients in this hook. This structure is
consistent with the ``PrecisionPlugin`` subclasses that cannot pass ``optimizer.step(closure)`` directly.
"""
def _torch_require_grad():
_x = torch.tensor([0.0], requires_grad=True)
_y = _x ** 2
return _y.requires_grad
_require_grad = _torch_require_grad()
torch.set_grad_enabled(True)
closure_result = closure()
self._after_closure(model, optimizer)
if not _require_grad:
torch.set_grad_enabled(False)
return closure_result
def optimizer_step( # type: ignore[override]
self,
optimizer: Steppable,
model: "pl.LightningModule",
closure: Callable[[], Any],
**kwargs: Any,
) -> Any:
"""Hook to run the optimizer step."""
closure = partial(self._wrap_closure, model, optimizer, closure)
return optimizer.step(closure=closure, **kwargs) This seems work fine. |
explicitly setting grad enabled for |
@0x404 I was thinking we should do it here, not in the precision plugin: |
Basically, at the beginning of this closure, set grad enabled. What do you think? |
@awaelchli Exactly! Would it be possible for me to submit a PR to address this? I'm relatively new to Lightning, so it might take me a day or two to become acquainted with the Lightning workflow. I am so interested in Lighting, so I think this is a good first issue for me. |
Definitely please give it a try. This is much appreciated, and I'm happy to help or answer questions. |
Hi @awaelchli, I encountered a few issues while attempting this:
Could you please point me to any relevant documentation that could assist me? Thank you. |
You need to make the modifications under src/lightning/pytorch. Ignore the
You can, but there are many tests and you won't be able to run all of them (which is what make test will attempt to do). For a simple change like yours, I suggest to just
But before doing that, if I were you I would just submit the PR first, then the CI can run once through the test suite and we see the output. If some tests fail, then that's the time to go and investigate. Let me know if that works. |
Thanks @awaelchli, I have already submit a PR #18268. |
Shouldnt this be fixed in the optimizer definition? As it forgot to take the closure into consideration |
HI, @carmocca. I feel the same way. But I think checking for gradients for closure in lighting is also a viable approach. This makes Lightning more compatible with various third-party libraries (including those that forgot to take the closure into consideration). What do you think? |
I agree. It's okay to add the explicit fix here too |
Bug description
When attempting to migrate my current model to Lightning, I encountered an error while using the AdamW optimizer provided by Hugging Face's Transformers library during training: "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn."
What version are you seeing the problem on?
v2.0
How to reproduce the bug
Error messages and logs
Environment
lightning==2.0.6
transformers==4.31.0
More info
Upon further analysis of Lightning's source code, I found that the issue stemmed from the use of the
@torch.no_grad()
decorator in the step function ofAdamW
provided by Hugging Face's Transformers library.Source code of
AdamW
provided by Hugging Face's Transformers library:Lightning wraps the
training_step
within a closure, and the actual execution oftraining_step
occurs duringoptimizer.step
. This leads to the fact that the step function in the Transformers library'sAdamW
runsmodel.training_step
and calculates the step loss, causingloss.requires_grad
to be False.To address this problem, I made the following temporary modifications to
precision_plugin.py
:This modification ensures that the internal
training_step
and related functions are executed before passing the actualclosure
to the optimizer. The closure result is then wrapped in a simple callable for the optimizer, allowing the optimizer to accessclosure_result
.I'm not certain if this is a correct fix. I am new to the Lightning community and find Lightning very convenient for me. I would like to contribute to Lightning. Would it be possible for me to receive guidance and open a pull request to fix this bug?
thanks in advance!
cc @Borda
The text was updated successfully, but these errors were encountered: