-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix global step increment on training_epoch_end #3673
Conversation
Looks like all tests are passing. Maybe add a test too to make sure its all good in future 🙂 |
That's the plan :) |
William will solve this in his branch |
The horovod test is failing randomly.... also verified on my local. @tgaddair mind taking a look in a follow on PR? |
Hello @awaelchli! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-09-27 23:50:23 UTC |
|
||
# Called every 3 steps, meaning for 1 epoch of 11 batches, it is called 3 times with gamma=0.1 | ||
assert pytest.approx(init_lr * 0.1) == adjusted_lr2 | ||
# @pytest.mark.skipif(platform.system() == "Windows", reason="Horovod is not supported on Windows") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgaddair weird error... happens only on some machines some times haha
What does this PR do?
global_step gets incremented if training_epoch_end is implemented.
This shouldn't be necessary and will lead to a misalignment in logs.
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃