-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ValueError with warning in has_len()
#9785
Comments
Here are my suggestions. raise a MisConfiguration if the total length across all ranks is 0. raise a warning if the total length isn't 0, but it is on 1 process. However, this would very likely result in hanging during the backward pass on the all_reduce operation as Lightning doesn't support uneven batches yet. I still believe this is something for advanced users. One extra option would be the following. total_length = ...
if total_length == 0:
raise Mis...
if total_length > 0 and local_length == 0:
if self.datamodule.enable_zero_length_dataloader_with_ddp:
warning
else:
raise MisConf.... |
@ninginthecloud to clarify, is this only for evaluation? Or does this check apply to training too? Uneven batches in validation also means the user needs to take extra precaution around syncing state, such as metrics. I think we ought to include that in the warning, as D2go handles this but others might not. |
@ananthsub the check applies for training too. The I think we can do what @tchaton suggested to compute the length over all ranks. |
Hey folks what's the status here? Seems like we're still having to apply an internal patch to replace this error with warning. |
The PR is ready and can be merged. A small comment is left to be addressed by @ninginthecloud :) |
🚀 Feature
Motivation
has_len()
in data.py checks if dataloader has__len__
method implemented.When dataloader has zero length, it raises a ValueError (see code). Recently, we got task from users brought up that when dataset has fewer samples than number of ranks, sampler would split the dataset unevenly, and sometimes 0 batch on certain ranks is inevitable. This ValueError blocked them to have successful PL run. This is also bridging the gap for Detectron2Go to better use Lightning: https://github.com/facebookresearch/d2go
Pitch
Replace ValueError with warning
Sync the length of dataloader across ranks instead of checking local dataset. By doing so, we can catch the situation when user unintentionally fit empty dataloader
Alternatives
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
The text was updated successfully, but these errors were encountered: