-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement] max_batches support to training log and tqdm progress bar. #1554
Merged
BloodAxe
merged 3 commits into
Deci-AI:master
from
hakuryuu96:feature/tqdm_batches_reset
Oct 23, 2023
Merged
[Improvement] max_batches support to training log and tqdm progress bar. #1554
BloodAxe
merged 3 commits into
Deci-AI:master
from
hakuryuu96:feature/tqdm_batches_reset
Oct 23, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hakuryuu96
requested review from
shaydeci,
ofrimasad,
BloodAxe and
Louis-Dupont
as code owners
October 19, 2023 15:23
…er) of max_batches)
BloodAxe
reviewed
Oct 20, 2023
BloodAxe
reviewed
Oct 20, 2023
BloodAxe
approved these changes
Oct 21, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Louis-Dupont
approved these changes
Oct 23, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BloodAxe
added a commit
that referenced
this pull request
Oct 26, 2023
* [Improvement] max_batches support to training log and tqdm progress bar. (#1554) * Added max_batches support to training log and tqdm progress bar. * Added changing string in accordance which parameter is used (len(loader) of max_batches) * Replaced stopping condition for the epoch with a smaller one (cherry picked from commit 749a9c7) * fix (#1558) Co-authored-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> (cherry picked from commit 8a1d255) * fix (#1564) (cherry picked from commit 24798b0) * Bugfix of model.export() to work correct with bs>1 (#1551) (cherry picked from commit 0515496) * Fixed incorrect automatic variable used (#1565) $@ is the name of the target being generated, and $^ are the dependencies Co-authored-by: Louis-Dupont <35190946+Louis-Dupont@users.noreply.github.com> (cherry picked from commit 43f8bea) * fix typo in class documentation (#1548) Co-authored-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Louis-Dupont <35190946+Louis-Dupont@users.noreply.github.com> (cherry picked from commit ec21383) * Feature/sg 1198 mixed precision automatically changed with warning (#1567) * fix * work with tmpdir * minor change of comment * improve device_config (cherry picked from commit 34fda6c) * Fixed issue with torch 1.12 where _scale_fn_ref is missing in CyclicLR (#1575) (cherry picked from commit 23b4f7a) * Fixed issue with torch 1.12 issue with arange not supporting fp16 for CPU device. (#1574) (cherry picked from commit 1f15c76) --------- Co-authored-by: hakuryuu96 <marchenkophilip@gmail.com> Co-authored-by: Louis-Dupont <35190946+Louis-Dupont@users.noreply.github.com> Co-authored-by: Alessandro Ros <aler9.dev@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue description
As @BloodAxe described, if training_hyperparams.max_train/valid_batches are redefined in CLI, tqdm and training log do not take changes into account and continue showing the full length of the dataloader. E.g. when user executes something like this
the resulting logs and progress bar are the following:
PR description
This PR addresses the issue above and proposes some improvements. Briefly:
1.5. Additionally, logs are warning user that the max_batches parameter was set.
E.g. if the max_train/valid_batches parameter is specified:
If not, the logs behave similarly to previous versions.
Some ideas
IMO it should be cool to consider logging the whole set of training parameters before the run. For me as a user, it would be nice to double-check all the settings I've made somewhere in the project (e.g. if I use hydra and take SG Trainer class to my pipeline) and to be sure things go smoothly :)
For example, the user should see the following info: