Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

num_training_batches rounds down, causing 0 batches count #631

Closed
filipeabperes opened this issue Dec 17, 2019 · 2 comments · Fixed by #649 or #653
Closed

num_training_batches rounds down, causing 0 batches count #631

filipeabperes opened this issue Dec 17, 2019 · 2 comments · Fixed by #649 or #653
Labels
bug Something isn't working

Comments

@filipeabperes
Copy link

filipeabperes commented Dec 17, 2019

🐛 Bug

self.num_training_batches is defined using int here, which rounds it down to 0 when a small training_percent_check or overfit_pct is used, even though at least 1 batch is still processed.

This does not cause any errors in "vanilla" lightning, but crashes any user code that uses the number of batches in a division (for example to get an average of some quantity over batches).

To Reproduce

Steps to reproduce the behavior:

Set the training percentage to a small enough percentage that the number of examples is smaller than the batch size for a given dataset.

This would require a very simple fix, either to use math.ceil() or max(1, self.num_training_batches), depending of how the quantity is expected to behave in the rest of the code.

@filipeabperes filipeabperes added the bug Something isn't working label Dec 17, 2019
@awaelchli
Copy link
Contributor

awaelchli commented Dec 18, 2019

I think max(1, self.num_training_batches) would be the best because it doesn't change the behaviour for existing code that has num_training_batches >= 1. I'm sure the PL team would appreciate if you made a PR. Is the bug also present for validation and test batches?

@kuynzereb
Copy link
Contributor

Well, actually it is not the rounding issue. The problem is that now we process num_training_batches + 1 batches instead of num_training_batches if num_training_batches < len(train_dataloader).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants