Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when training on youtubemix #37

Closed
andrewliu2001 opened this issue May 31, 2022 · 1 comment
Closed

Error when training on youtubemix #37

andrewliu2001 opened this issue May 31, 2022 · 1 comment

Comments

@andrewliu2001
Copy link

When running CUDA_VISIBLE_DEVICES=1,2,3,4,7 python -m train wandb=null experiment=sashimi-youtubemix dataset=youtubemix, I get the following error:

Traceback (most recent call last):
File "/data/al451/state-spaces/train.py", line 553, in main
train(config)
File "/data/al451/state-spaces/train.py", line 498, in train
trainer.fit(model)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
self._run_sanity_check(self.lightning_module)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1371, in _run_sanity_check
self._evaluation_loop._reload_evaluation_dataloaders()
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 170, in _reload_evaluation_dataloaders
self.trainer.reset_val_dataloader()
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 551, in reset_val_dataloader
self.num_val_batches, self.val_dataloaders = self._reset_eval_dataloader(
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 508, in _reset_eval_dataloader
if has_len_all_ranks(dataloader, self.training_type_plugin, module)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py", line 118, in has_len_all_ranks
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: Total length of Dataloader across ranks is zero. Please make sure that it returns at least 1 batch.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

@krandiash
Copy link
Contributor

Hi, can you confirm if single GPU training works for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants