-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add patience argument to Trainer #4186
Add patience argument to Trainer #4186
Conversation
This supercedes #2840, where I added patience to the outdated |
Looking good! Can you add a reference to your original post that this closes #4894? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Small suggestions there
best_eval_loss = None | ||
evals_without_improvement = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefix those with patience_
as they're specific to this
best_eval_loss = None | |
evals_without_improvement = 0 | |
patience_best_eval_loss = None | |
patience_evals_without_improvement = 0 | |
patience_should_stop = False |
logger.info( | ||
f"Patience threshold ({self.args.patience}) exceeded, stopping training" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info( | |
f"Patience threshold ({self.args.patience}) exceeded, stopping training" | |
) | |
patience_should_stop = True | |
logger.info( | |
f"Patience threshold ({self.args.patience}) exceeded, stopping training" | |
) |
if ((self.args.max_steps > 0 and global_step > self.args.max_steps) or | ||
(self.args.patience > 0 and evals_without_improvement >= self.args.patience)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ((self.args.max_steps > 0 and global_step > self.args.max_steps) or | |
(self.args.patience > 0 and evals_without_improvement >= self.args.patience)): | |
if ((self.args.max_steps > 0 and global_step > self.args.max_steps) or | |
patience_should_stop): |
epoch_iterator.close() | ||
break | ||
if self.args.max_steps > 0 and global_step > self.args.max_steps: | ||
if ((self.args.max_steps > 0 and global_step > self.args.max_steps) or | ||
(self.args.patience > 0 and evals_without_improvement >= self.args.patience)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Hello, when this feature will be merged? I would like to use it. Thank you. |
There are some changes requested that @thesamuel should fix before this can be merged. |
Bump. Early stopping is critical for an automated Trainer that reliably gives us the best model. Current way of figuring out the training stopping point seems to be specifying a static train_epochs but the training duration a model can take depends on way too many factors like learning rate, data complexity, model, model size, optimizer and so on that it is unreasonable to ask the user to specify the epochs in advance. |
I would like to use this early stopping on downstream training. I also would like to add a feature that stores the model each time when the monitored metric improves and then optionaly loads the model after training. Then later evaluation can be done on this "best" model. @thesamuel @julien-c @kevin-yauris what do you think? |
I plan to work on this once I'm finished with the Funnel Transformer model @PhilipMay (so end of this week, beginning of the next). |
@sgugger That would be awsome. Maybe you want to get some inspiration from the FARM training loop which is pretty nice IMO: https://github.com/deepset-ai/FARM/blob/master/farm/train.py#L262-L370 |
I just found this PR that was already merged: #7431 |
Not quite, but it makes implementing it easier. |
Yes - you are right. The patience part is still missing. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@sgugger Should we keep this open? You wrote in this thread you will work on this if you find the time, but I am not sure if you plan to use another PR for that. |
There has been a PR merged adding the |
Thanks @cbrochtrup @sgugger! Sorry I didn't get around to this... |
You're welcome, happy to help! |
This closes #4894.
Summary
Often, we want to stop training if loss does not improve for a number of epochs. This PR adds a "patience" argument, which is a limit on the number of times we can get a non-improving eval loss before stopping training early.
It is implemented by other NLP frameworks, such as AllenNLP (see trainer.py and metric_tracker.py).
Motivation
This feature allows faster fine-tuning by breaking the training loop early and avoids users the toil of checking metrics on Tensorboard.
Caveats
Often, models are evaluated once per epoch, but run_lm_finetuning.py has an option to evaluate after a set number of model update steps (dictated by
--logging_steps
if--evaluate_during_training
is true). Because of this, I've elected to tie patience to the number of evaluations without improvement in loss.