Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 Add early stopping to the trainer #4894

Closed
BramVanroy opened this issue Jun 10, 2020 · 8 comments
Closed

🚀 Add early stopping to the trainer #4894

BramVanroy opened this issue Jun 10, 2020 · 8 comments

Comments

@BramVanroy
Copy link
Collaborator

BramVanroy commented Jun 10, 2020

🚀 Feature request

The trainer (pt, tf) is an easy access point for users who rather not spend too much time building their own trainer class but prefer an out-of-the-box solution. Even though transformers was never meant to be a fully fletched training library, it might please users to add an additional feature: early stopping.

Motivation

Early stopping ensures that the trainer does not needlessly keep training when the loss does not improve. This saves time, money, and let's not forget the trees. 😉 Performance-wise this should not lead to different results.

Your contribution

At the moment I cannot work on this, but here are my thoughts:

  • a training argument should be added (pt, tf). This would only work when evaluate_during_training is enabled.
  • for PyTorch: at every evaluation step, an early stopper (can be a separate class even) checks if the loss has improved in the last n steps. Potentially with a minimal threshold that the loss should have improved. If not, the trainer should stop
  • for Tensorflow: I don't have experience with TF myself, but I assume one could use tf.keras.callbacks.EarlyStopping.
@stale
Copy link

stale bot commented Aug 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 9, 2020
@BramVanroy
Copy link
Collaborator Author

Looking at the interest this topic has, I am bumping it to re-open it.

@stale stale bot removed the wontfix label Aug 9, 2020
@san7988
Copy link

san7988 commented Aug 17, 2020

Hi,

So when #4186 is closed, this will close as well? Or is there any more changes expected. on this issue, apart from what #4186 adds?

Thanks

@KMFODA
Copy link
Contributor

KMFODA commented Aug 18, 2020

If I've understood things correctly, I think #4186 only addresses the Pytorch implementation of the trainer. @BramVanroy if that's the case I'm happy to work on implementing this feature in Tensorflow (trainer_tf.py).

@BramVanroy
Copy link
Collaborator Author

@san7988 @KMFODA This issue should not directly be closed when that PR is merged because as @KMFODA mentions, it only seems to address PyTorch. A PR for Tensorflow is also welcome!

@KMFODA
Copy link
Contributor

KMFODA commented Oct 2, 2020

Thanks for clarifying @BramVanroy. Apologies I was out for the past month due to a personal issue. I'll submit a PR for Tensorflow early stopping now.

@BramVanroy
Copy link
Collaborator Author

An early stopping callback has now been introduced in the PyTorch trainer by @cbrochtrup! 👏

AFAIK the implementation the TF Trainer is still under way (#7533) so I'll keep this topic open for now.

@djrochford
Copy link

I gather from the conversation on #7533 that this issue should now be closed; is that correct, @BramVanroy ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants