-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive Learning Rate got reset when I restart the training? #2866
Comments
@GiorgioSgl Resuming TrainingResuming an interrupted training run is simple. There are two options: python train.py --resume # resume latest training
python train.py --resume path/to/last.pt # specify resume checkpoint If you started the training with a multi-GPU command then you must resume it with the same exact configuration (and vice versa). Multi-GPU resume commands are here, assuming you are using 8 GPUs: python -m torch.distributed.launch --nproc_per_node 8 train.py --resume # resume latest training
python -m torch.distributed.launch --nproc_per_node 8 train.py --resume path/to/last.pt # specify resume checkpoint Note that you may not change any settings when resuming, you must resume with the same exact settings ( |
@glenn-jocher "github: up to date with https://github.com/ultralytics/yolov5 ✅ It is no different when python train.py --resume or python train.py --resume path/to/last.pt . Thanks. |
@suyong2 thanks for the message! I think this may be related to a recent change in save_dir handling in train.py. We switched to Path() directories for this variable, which may not be playing well with yaml save and resume, which would be my fault if true. I'll check it out. |
@suyong2 good news 😃! Your original issue has now been fixed ✅ in PR #2876. To receive this update you can:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
@glenn-jocher Fortunately, it works well with the changed Yolov5 source. |
❔Question
Hi,
My question is simple, I'm trying to train the YOLOv5 on google colab and they are kill my session in genereal after 5 hours of using the GPU, so I can't do training only in one shot, but I have to restart it all the time using last checkpoint of the weight (last.pt).
As you will know the Learning Rate is change over epochs and I want to ask if it got reset when I restart the training or not?
Thank you in advance,
Giorgio
Additional context
The text was updated successfully, but these errors were encountered: