Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support optional flag to clamp gradient in 'backward' to prevent crash #45

Closed
wants to merge 8 commits into from

Conversation

daniel347x
Copy link
Contributor

This commit addresses a very intermittent, but deadly crash bug that is destroying my training runs - a very occasional infinite gradient in the 'backward' function.

In this commit, functionality remains unchanged by default.

However, an optional flag has been added that allows clamping the gradient in the 'backward' function. The flag takes the form of an int or sequence giving the min/max value.

An optional third value in the passed sequence is interpreted as a Boolean that indicates whether to print a warning to the console whenever an infinite gradient is clamped. The default is False.

Support for PyTorch only.

This commit addresses a very intermittent, but deadly crash bug that is destroying my training runs - a very occasional infinite gradient in the 'backward' function.

In this commit, functionality remains unchanged by default.

However, an optional flag has been added that allows clamping the gradient in the 'backward' function. The flag takes the form of an int or sequence giving the max value (or min/max if sequence).

An optional third value in the passed sequence is interpreted as a Boolean that indicates whether to print a warning to the console whenever an infinite gradient is clamped. The default is False.

Support for PyTorch only.
…tions

Also, add dummy argument in 'backward' to match new backward_clamp_gradient_mag argument
@BachiLi
Copy link
Owner

BachiLi commented Oct 17, 2022

This looks great. Any chance you can help writing a small test for this to make sure it is indeed clamping the gradients correctly?

@daniel347x
Copy link
Contributor Author

I'd be happy to if you can give just a hint or two about how/where it would go... It's a bit tricky for me to understand how to extract the logic in this case from the actual runtime location in the 'backward' function...

@daniel347x
Copy link
Contributor Author

See new PR that is identical to this but uses a different branch from my fork:

#48

@daniel347x daniel347x closed this Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants