Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev shuffle data option #44

Merged
merged 5 commits into from
Sep 16, 2021
Merged

Dev shuffle data option #44

merged 5 commits into from
Sep 16, 2021

Conversation

powl7
Copy link
Contributor

@powl7 powl7 commented Sep 15, 2021

Description

  • added shuffle and drop_last_batch config options to data module
  • added warning when num_samples not dividable by batch_size

How to Test/Run?

python run.py

Something missing?

I don't think so

@powl7 powl7 linked an issue Sep 15, 2021 that may be closed by this pull request
3 tasks
@powl7 powl7 requested a review from lvoegtlin September 15, 2021 14:33
@powl7 powl7 assigned lvoegtlin and powl7 and unassigned lvoegtlin Sep 15, 2021
Comment on lines 55 to 73
if self.trainer.distributed_backend == 'ddp':
batch_size = self.trainer.datamodule.batch_size
if stage == 'fit':
num_samples = self.trainer.datamodule.his_train.num_samples
datasplit_name = 'train'
elif stage == 'test':
num_samples = self.trainer.datamodule.his_test.num_samples
datasplit_name = 'test'
else:
# unknown stage
log.warn(f'Unknown stage ({stage}) during setup!')
num_samples = -1
datasplit_name = None

if num_samples % self.trainer.datamodule.batch_size != 0:
log.warn(
f'Number of sample ({num_samples}) in {datasplit_name} not dividable by batch size ({batch_size}).')
log.warn(f'Last batch will be incomplete. Behavior depends on datamodule.batch_drop_last setting.')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move it into the base_task but if we want to do that we have to name the datasets in a common fashion

… dataset inside the datamodule to test, train, and val
@lvoegtlin lvoegtlin self-requested a review September 16, 2021 08:36
Copy link
Contributor

@lvoegtlin lvoegtlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as it should! gj

num_samples = self.trainer.datamodule.his_train.num_samples
datasplit_name = 'train'
elif stage == 'test':
num_samples = self.trainer.datamodule.his_test.num_samples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use len(self.trainer.datamodule.his_train) instead of the num_samples field. Makes it more generic

@lvoegtlin lvoegtlin merged commit 6068e13 into dev Sep 16, 2021
@lvoegtlin lvoegtlin deleted the dev_shuffle_data_option branch September 16, 2021 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DDP metric bias
2 participants