Support early stopping during training inside of the early stopping callback #7033

ananthsub · 2021-04-15T06:25:14Z

🚀 Feature

The ability to early stop during training, as controlled by the callback instead of the training loop.

Motivation

This simplifies the training loop from manually running these callback hooks if the validation epoch won't be run: https://github.com/PyTorchLightning/pytorch-lightning/blob/5bd3cd5f712b65d38812b27cf957261bb06b83c5/pytorch_lightning/trainer/training_loop.py#L152-L159

Similar to flexibility with checkpointing callbacks, this can eventually enable users to specify separate early stop criteria for both training and validation.

Pitch

Add an check_on_train_epoch_end flag to the callback constructor. See #6944 for a sketch

This flag controls whether we check during training or validation. Because the monitor metric may be in training but not validation, or vice versa, this flag makes the check across these two hooks mutually exclusive.

    def on_train_epoch_end(self, trainer, pl_module, outputs) -> None:
        if not self._on_train_epoch_end or self._should_skip_check(trainer):
            return
        self._run_early_stopping_check(trainer)

    def on_validation_end(self, trainer, pl_module):
        if self._on_train_epoch_end or self._should_skip_check(trainer):
            return
        self._run_early_stopping_check(trainer)

For parity with existing behavior, by default this flag will be False by default. With this feature enabled, users can specify their callback for early stopping during training as such:

stop = EarlyStopping(monitor='abc', min_delta=0.1, patience=0, check_on_train_epoch_end=True)

Users could then create multiple such callbacks:

train_stop = EarlyStopping(monitor='abc', min_delta=0.1, patience=0, check_on_train_epoch_end=True)
val_stop = EarlyStopping(monitor='val_loss', min_delta=0.5, patience=3)
trainer = Trainer(...., callbacks=[train_stop, val_stop], ...)
trainer.fit(...)

Alternatives

Keep as is

Additional context

The text was updated successfully, but these errors were encountered:

ananthsub added feature Is an improvement or enhancement help wanted Open to be worked on labels Apr 15, 2021

ananthsub mentioned this issue Apr 15, 2021

[1/2] Add support for early stopping during training epoch end #6944

Merged

11 tasks

awaelchli mentioned this issue Apr 16, 2021

during training, model not able to save checkpoint at the end of every epoch #6597

Closed

ananthsub mentioned this issue Apr 17, 2021

[2/2] Remove training loop force calling early stopping callback #7069

Merged

11 tasks

carmocca closed this as completed in #6944 Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support early stopping during training inside of the early stopping callback #7033

Support early stopping during training inside of the early stopping callback #7033

ananthsub commented Apr 15, 2021 •

edited

Loading

Support early stopping during training inside of the early stopping callback #7033

Support early stopping during training inside of the early stopping callback #7033

Comments

ananthsub commented Apr 15, 2021 • edited Loading

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

ananthsub commented Apr 15, 2021 •

edited

Loading