-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Lightning 2.0 Trainer issues length-zero DataLoader / CombinedLoader warning when num_sanity_val_steps=0 #17193
Comments
I don't see the warning when setting import os
import torch
from torch.utils.data import DataLoader, Dataset
from lightning.pytorch import LightningModule, Trainer
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
class BoringModel(LightningModule):
def __init__(self):
super().__init__()
self.layer = torch.nn.Linear(32, 2)
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
loss = self(batch).sum()
self.log("train_loss", loss)
return {"loss": loss}
def validation_step(self, batch, batch_idx):
loss = self(batch).sum()
self.log("valid_loss", loss)
def configure_optimizers(self):
return torch.optim.SGD(self.layer.parameters(), lr=0.1)
def run():
train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
model = BoringModel()
trainer = Trainer(
default_root_dir=os.getcwd(),
num_sanity_val_steps=0,
max_epochs=1,
enable_model_summary=False,
)
trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
if __name__ == "__main__":
run() |
Thanks @awaelchli . I'll try to come up with a minimal example built on top of the code you shared above. Won't be easy as the dataset code is rather long and coupled to some other stuff. Just to say, my dataloader is built around an IterableDataset - maybe that's causing the issue (i.e. the Lightning code can't figure out the total length of the dataloader?) |
getting a similar warning when using a custom IterableDataset |
Yes, it is because of the IterableDataset. With this hint, here is a reproducible example: import os
import torch
from torch.utils.data import IterableDataset
from torch.utils.data import DataLoader
from lightning.pytorch import LightningModule, Trainer
class RandomDataset(IterableDataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __iter__(self):
for i in range(self.len):
yield self.data[i]
class BoringModel(LightningModule):
def __init__(self):
super().__init__()
self.layer = torch.nn.Linear(32, 2)
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
loss = self(batch).sum()
self.log("train_loss", loss)
return {"loss": loss}
def configure_optimizers(self):
return torch.optim.SGD(self.layer.parameters(), lr=0.1)
def run():
train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
model = BoringModel()
trainer = Trainer(
default_root_dir=os.getcwd(),
limit_train_batches=1,
limit_val_batches=1,
limit_test_batches=1,
num_sanity_val_steps=0,
max_epochs=1,
enable_model_summary=False,
)
trainer.fit(model, train_dataloaders=train_data)
if __name__ == "__main__":
run() |
The relevant logic to address is in this function: https://github.com/Lightning-AI/lightning/blob/4f82068bcf21f5008aecd46426c806514209112c/src/lightning/pytorch/utilities/data.py#L91-L133 I opened #17218 changing it to skip all those checks if |
Bug description
Hello, after upgrading to pytorch lightning 2.0 my
trainer.fit
started issuing the following warnings:This is a single-gpu run with
strategy="auto"
,devices=1
,accelerator="gpu"
andnum_sanity_val_steps=0
. I suspect the removal of the validation sanity check may be leading to this warning? My val_dataloader definitely has length >> 1 (I am waiting for the validation epoch to run through at this very moment...)How to reproduce the bug
No response
Error messages and logs
Environment
Current environment
More info
No response
cc @justusschock @awaelchli
The text was updated successfully, but these errors were encountered: