-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improves docs and handling of entities and resuming by WandbLogger #3264
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Hello @charlesfrye, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- ✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f
- ✅ Verify all Continuous Integration (CI) checks are passing.
- ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee
UpdateCourtesy of @AyushExel, who tested more of the features implemented here, 4d79628 improves the error message on resumption failure. Non-parallel resumption has been tested. DDP resumption has not been tested due to issues getting an environment set up (which are the same in Deep Dive on UpdateThe previous check looked for whether metadata But for finished runs, these variables are actually |
@charlesfrye @AyushExel great, thanks for your contributions guys! Is there a way to consolidate the last/latest checkpoint names into a single one? Maybe renaming |
I included both aliases to preserve backwards compatibility, but given that this feature is relatively fresh, I think it's right to prefer simplicity and we can just use |
@charlesfrye great, PR is merged! :) |
…ltralytics#3264) * adds latest tag to match wandb defaults * adds entity handling, 'last' tag * fixes bug causing finished runs to resume * removes redundant "last" tag for wandb artifact (cherry picked from commit 19100ba)
…ltralytics#3264) * adds latest tag to match wandb defaults * adds entity handling, 'last' tag * fixes bug causing finished runs to resume * removes redundant "last" tag for wandb artifact
What does this fix?
The
--entity
argument for configuring theWandbLogger
is currently unused. This PR fixes that, including for resumed runs, fixes a bug for run resuming, and adds a docstring to theWandbLogger
class.How has this been tested?
The core fix, to entity setting for non-resumed runs, is tested in this repro colab, and the new tagging is used in this artifact produced by that colab.
The other fixes, including resumption based on the new tags, have not been directly tested, since I don't have a good setup for testing resumption and DDP. @AyushExel could you check these?
Deep dive
Projects in Weights & Biases are organized under entities, which may be organizations, teams, or individual users. A user may sometimes wish to log to their personal entity (e.g.
glenn-jocher
) or to their team entity (e.g.ultralytics
).The
--entity
CLI argument totrain.py
is intended to allow users to set the entity to which they are logging. It does not do so currently, and this PR fixes that.In addition, the handling of entities in resumed runs (and in DDP sub-processes during resumed runs) was implicit -- it was assumed that the entity of the provided run was the same as the user. This has been corrected in the PR, and resumed runs are now logged to the original entity, not the user's entity.
Separately, this surfaced an issue with the resumption of runs -- in
train.py
, models are given the aliaslast
, while in the code for resumption it is assumed that they have aliaslatest
. The former is a YOLOv5 convention and the latter is a wandb convention. For backwards compatibility and completeness, this PR applies both aliases, but could be amended to use just one or the other.Finally, this commit adds extra docstrings to
wandb_utils
: at the module level and toWandbLogger
.🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhancements to Weights & Biases integration in YOLOv5 training script.
📊 Key Changes
train.py
.wandb_utils.py
with a docstring introduction and improved code comments.entity
in Weights & Biases artifact paths, allowing logging across different W&B entities.download_model_artifact
to only allow resuming incomplete runs.🎯 Purpose & Impact
entity
allows training across various Weights & Biases teams/entities, providing better organization and access control.