Raise an error when lightning replaces an existing sampler #2020

devashishshankar · 2020-05-30T19:29:35Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #1995

Currently, Trainer replaces the existing sampler with DistributedSampler
if running distributing training and replace_sampler_ddp=True (default
behaviour). If a user has configured an existing sampler, this would
lead to widely different results if running a distributed vs
non-distributed training.

This PR fixes this by raising an Error if user has configured a sampler
and uses replace_sampler_ddp=True. The recommended behavior from now
on is to either remove the sampler or set replace_sampler_ddp=False

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Currently, Trainer replaces the existing sampler with DistributedSampler if running distributing training and `replace_sampler_ddp=True` (default behaviour). If a user has configured an existing sampler, this would lead to widely different results if running a distributed vs non-distributed training. This PR fixes this by raising an Error if user has configured a sampler and uses `replace_sampler_ddp=True`. The recommended behavior from now on is to either remove the sampler or set `replace_sampler_ddp=False`

devashishshankar · 2020-05-30T20:33:22Z

@Borda I didn't realize that DistributedSampler by default has shuffle on! So actually my condition only makes sense when someone is using a custom sampler (other than SequentialSampler/ RandomSampler). I think that's a neater way to solve the problem and doesn't require any change to existing tests. Updated PR with the same.

pep8speaks · 2020-05-30T20:45:00Z

Hello @devashishshankar! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-01 20:47:58 UTC

pytorch_lightning/trainer/data_loading.py

codecov · 2020-05-31T05:08:14Z

Codecov Report

Merging #2020 into master will increase coverage by 0%.
The diff coverage is 92%.

@@          Coverage Diff           @@
##           master   #2020   +/-   ##
======================================
  Coverage      88%     88%           
======================================
  Files          74      74           
  Lines        4654    4658    +4     
======================================
+ Hits         4076    4080    +4     
  Misses        578     578

justusschock · 2020-06-01T10:57:03Z

pytorch_lightning/trainer/data_loading.py

@@ -113,39 +113,48 @@ def auto_add_sampler(self, dataloader: DataLoader, train: bool) -> DataLoader:
        need_dist_sampler = (self.use_ddp or self.use_ddp2 or self.use_horovod or self.use_tpu)

        if self.replace_sampler_ddp and need_dist_sampler:
+            if not isinstance(dataloader.sampler, (SequentialSampler, RandomSampler)):


this way you also have to include SubsetRandomSampler and WeightedRandomSampler I guess

Hmm, shouldn't a warning really be raised if one of these is getting replaced by DistributedSampler? The original motivation of this issue was that I had a WeightedRandomSampler - and lightning replaced it leading to val acc drop. Took me a long time to debug this. Hence thought that it may be a good idea to raise an Error when an existing sampler is replaced.

Also, if someone uses one of these sampler - won't they get different results if using distributed vs non-distributed training?

Maybe you're right about that :)

pytorch_lightning/trainer/data_loading.py

tests/trainer/test_dataloaders.py

Borda

LGTM 🚀

* Raise an error when lightning replaces an existing sampler Currently, Trainer replaces the existing sampler with DistributedSampler if running distributing training and `replace_sampler_ddp=True` (default behaviour). If a user has configured an existing sampler, this would lead to widely different results if running a distributed vs non-distributed training. This PR fixes this by raising an Error if user has configured a sampler and uses `replace_sampler_ddp=True`. The recommended behavior from now on is to either remove the sampler or set `replace_sampler_ddp=False` * Fix tests * Simpler fix * Fix tests * Make inner method protected * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

mergify bot requested a review from a team May 30, 2020 19:30

Fix tests

cffee08

This comment has been minimized.

Sign in to view

Simpler fix

98aa87b

devashishshankar force-pushed the feature/1995_log_error_replace_sampler branch from 84175f6 to dc076ac Compare May 30, 2020 20:46

devashishshankar commented May 30, 2020

View reviewed changes

pytorch_lightning/trainer/data_loading.py Outdated Show resolved Hide resolved

Fix tests

2a849c3

devashishshankar force-pushed the feature/1995_log_error_replace_sampler branch from dc076ac to 2a849c3 Compare May 30, 2020 20:52

devashishshankar marked this pull request as ready for review May 30, 2020 21:00

Make inner method protected

4b62609

justusschock requested changes Jun 1, 2020

View reviewed changes

mergify bot requested a review from a team June 1, 2020 10:57

justusschock approved these changes Jun 1, 2020

View reviewed changes

mergify bot requested a review from a team June 1, 2020 17:56

Borda added the feature Is an improvement or enhancement label Jun 1, 2020

Borda added this to the 0.8.0 milestone Jun 1, 2020

Borda reviewed Jun 1, 2020

View reviewed changes

pytorch_lightning/trainer/data_loading.py Outdated Show resolved Hide resolved

pytorch_lightning/trainer/data_loading.py Outdated Show resolved Hide resolved

tests/trainer/test_dataloaders.py Outdated Show resolved Hide resolved

Apply suggestions from code review

a434593

mergify bot requested a review from a team June 1, 2020 20:48

Borda approved these changes Jun 1, 2020

View reviewed changes

mergify bot requested a review from a team June 1, 2020 20:49

williamFalcon merged commit ade3f36 into Lightning-AI:master Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Raise an error when lightning replaces an existing sampler #2020

Raise an error when lightning replaces an existing sampler #2020

Uh oh!

devashishshankar commented May 30, 2020 •

edited

Loading

Uh oh!

This comment has been minimized.

devashishshankar commented May 30, 2020

Uh oh!

pep8speaks commented May 30, 2020 •

edited

Loading

Uh oh!

Uh oh!

codecov bot commented May 31, 2020

Uh oh!

justusschock Jun 1, 2020

Uh oh!

devashishshankar Jun 1, 2020

Uh oh!

justusschock Jun 1, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Borda left a comment

Uh oh!

Uh oh!

Raise an error when lightning replaces an existing sampler #2020

Raise an error when lightning replaces an existing sampler #2020

Uh oh!

Conversation

devashishshankar commented May 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

What does this PR do?

PR review

Did you have fun?

Uh oh!

This comment has been minimized.

devashishshankar commented May 30, 2020

Uh oh!

pep8speaks commented May 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-06-01 20:47:58 UTC

Uh oh!

Uh oh!

codecov bot commented May 31, 2020

Codecov Report

Uh oh!

justusschock Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

devashishshankar Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

justusschock Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Borda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devashishshankar commented May 30, 2020 •

edited

Loading

pep8speaks commented May 30, 2020 •

edited

Loading