Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast_dev_run should set max_steps #5136

Closed
indigoviolet opened this issue Dec 14, 2020 · 9 comments · Fixed by #5277
Closed

fast_dev_run should set max_steps #5136

indigoviolet opened this issue Dec 14, 2020 · 9 comments · Fixed by #5277
Assignees
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@indigoviolet
Copy link

🐛 Bug

Trainer(fast_dev_run=True) should set max_steps accordingly, but max_steps is None

Please reproduce using the BoringModel and post here

https://colab.research.google.com/drive/1PFoozULca2zFtw0Ljor5V8AAGsGqe1Y4#scrollTo=quj4LUDgmFvj

To Reproduce

Expected behavior

max_steps should be set to 1 if fast_dev_run is True else fast_dev_run

Environment

Note: Bugs with code are solved faster ! Colab Notebook should be made public !

You can get the script and run it with:

wget https://raw.githubusercontent.com/PyTorchLightning/pytorch-lightning/master/tests/collect_env_details.py
# For security purposes, please check the contents of collect_env_details.py before running it.
python collect_env_details.py
  • CUDA:
    • GPU:
      • Tesla T4
    • available: True
    • version: 10.1
  • Packages:
    • numpy: 1.18.5
    • pyTorch_debug: True
    • pyTorch_version: 1.7.0+cu101
    • pytorch-lightning: 1.1.0
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.6.9
    • version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

Additional context

@indigoviolet indigoviolet added bug Something isn't working help wanted Open to be worked on labels Dec 14, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@awaelchli awaelchli self-assigned this Dec 15, 2020
@rohitgr7
Copy link
Contributor

rohitgr7 commented Dec 15, 2020

max_steps should be set to 1 if fast_dev_run is True else fast_dev_run

Do you mean?
max_steps should be set to 1 if fast_dev_run is True else max_steps

also val_check_interval, check_val_every_n_epoch be set to 1.0 and 1 respectively as well??

in the video here it says, fast_dev_run won't create any logs or save checkpoints but unfortunately it does both. Not sure what's the intended behavior.

@awaelchli
Copy link
Contributor

fast_dev_run won't create any logs or save checkpoints but unfortunately it does both

I noticed it too, it's a bug (I tried to explain here #4629). it should really just be a test that the training loop runs. logging is still executed to some extend but it should not get sent to the actual logger object, no files should be saved. The motivation is to be able to debug the script without polluting the working dir with tons of files :)

@rohitgr7
Copy link
Contributor

yeah, I believe a simple fix would be to disable loggers, checkpoint callback, earlystopping, ... maybe more in the init itself.

@indigoviolet
Copy link
Author

Will that fix the original bug filed here - ie. max_steps not being set, which can break the optimizers/schedulers that refer to it?

@Borda
Copy link
Member

Borda commented Dec 17, 2020

@tchaton ^^

@rohitgr7
Copy link
Contributor

which can break the optimizers/schedulers that refer to it?

@indigoviolet mind explain this a bit more, what do you mean here?

@indigoviolet
Copy link
Author

I meant that the lightning module might need to refer to self.trainer.max_steps to configure its optimizers/schedulers, and this breaks in fast_dev_run.

@rohitgr7
Copy link
Contributor

yeah then max_steps should be set correctly too in such a case. But fast_dev_run is meant to debug, I don't think anyone should set it to a value > max_steps (even if someone does, it will stop at max_steps only) so scheduler might get a higher number of iterations > fast_dev_run, but it won't break in such a case too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants