Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run loop without validation. #536

Closed
DrClick opened this issue Nov 21, 2019 · 8 comments
Closed

run loop without validation. #536

DrClick opened this issue Nov 21, 2019 · 8 comments
Labels
feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@DrClick
Copy link

DrClick commented Nov 21, 2019

Is your feature request related to a problem? Please describe.
We would like to be able to tell the trainer to not run the validation loop at all even though we have defined it. We have many models that inherit from our base module and our team finds this useful often.

Describe the solution you'd like
A trainer bool flag for no_validation.

Describe alternatives you've considered
Setting validation percentage to zero. This seems less clean

Additional context
I have already implemented this and would like to submit a PR

@DrClick DrClick added feature Is an improvement or enhancement help wanted Open to be worked on labels Nov 21, 2019
@DrClick
Copy link
Author

DrClick commented Nov 21, 2019

Additionally, the commit by @neggert


#from the commit in the Trainer init
...
early_stop_callback=True,
...

#from line 185
# creates a default one if none passed in
        if early_stop_callback is True:
            self.early_stop_callback = EarlyStopping(
                monitor='val_loss',
                patience=3,
                verbose=True,
                mode='min'
            )
            self.enable_early_stop = True
        elif not early_stop_callback:
            self.early_stop_callback = None
            self.enable_early_stop = False
        else:
            self.early_stop_callback = early_stop_callback
            self.enable_early_stop = True

seems to have unintended consequence. If you do not pass an argument for early_stopping, you would assume you don't want early stopping. Here, the default value True then sets up a default EarlyStopping callback. This directly effects this issue because validation is not being called, so there is no val_loss metric, and the training loop exits. I don't think the elif can even be reached here, (assuming you don't pass early_stop_callback=None to the trainer so I don't think this is consistent.

Is the intent of PL to provide out of the box early stopping? If so is patience 3 a great number?

The following is my current work around:

trainer = Trainer(
    gpus=hparams.gpus,
    distributed_backend=hparams.distributed_backend,
    use_amp=hparams.use_16bit,
    check_val_every_n_epoch=1,
    early_stop_callback=None,
    no_validation=True
)```

@neggert
Copy link
Contributor

neggert commented Nov 21, 2019

That's a question for @williamFalcon. IIRC, this commit was actually a cleanup of what was already there. I tend to agree with you that we shouldn't enable early stopping by default, but I understand the argument for it.

@DrClick
Copy link
Author

DrClick commented Nov 21, 2019

If adding the feature for no_validation flag is something we want to add to the repo we should discuss if we want to address this default early stopping as it stands setting the no_validation will not work without also setting the early_stop_callback=None flag. This could be ok as long as its documented but it feels hacky

@williamFalcon
Copy link
Contributor

williamFalcon commented Nov 25, 2019

A large portion of users don't need to deal with the complexity of setting up early stopping. 3 is pretty standard in research workflows. I think 90% of use cases want auto early-stopping with a sensible default. I'd say anything other than that is definitely an edge case where setting up your own schedule makes the most sense.

In terms of turning off validation, i'm just wondering why we need another flag and can't set

val_percent_check = 0

@DrClick
Copy link
Author

DrClick commented Nov 25, 2019

Good morning @williamFalcon. This was my original thought but it seemed it would not work at first glance. Our workflow involves creating a base PL model on a project that handles the basic methods (for instance data loader setup, metrics gathering etc. and then iterating with different architectures and techniques. In this scenario we have defined a Val dataloader in the base class but specifically our users want to be able to run with no validation (time consuming) for a while and see what happens to quickly iterate. I thought this was more cleanly accomplished by setting a flag so that the validation loop is not called ever. (For instance in the sanity check in our example it could break if we send all the data via hyperperamters to the train method dataloader but our base class has defined a validation data loader). So from the principle of separation of concerns, it seems a bit “messy” to set the validation to zero.

It seems possible the larger problem is to deal cleanly with the already large parameter list. On one hand, although its “ugly” it is actually a super convenient way to set it up from an end user point of view. Using kwargs certainly is a mixed bag. It might be the right way to go with a lot of documentation. Certainly a config file dictionary could work as well. Maybe there could be a two trainers. The BasicTrainer which has a much simplified param list and Trainer takes a config or dictionary’s or kwargs.

Anyhow, I’m on vacation for the week and I will be single shortly if I continue to be on github. Thank you for this incredible project. I hope I can help contribute.

@Borda
Copy link
Member

Borda commented Nov 25, 2019

there is a discussion about parameters in #541

@LanceKnight
Copy link

val_percent_check = 0

Couldn't find this flag. Is it removed by newer updates?

@bparaj
Copy link

bparaj commented Nov 16, 2022

Looks like we need to set limit_val_batches to 0.0.
https://github.com/Lightning-AI/lightning/blob/e93c64964e416bde5dbe1893881180d4371883c3/src/pytorch_lightning/trainer/trainer.py#L1536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants