Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#7858 breaks IterableDataset with __len__ in Trainer #8087

Closed
cccntu opened this issue Oct 27, 2020 · 3 comments · Fixed by #8095
Closed

#7858 breaks IterableDataset with __len__ in Trainer #8087

cccntu opened this issue Oct 27, 2020 · 3 comments · Fixed by #8095

Comments

@cccntu
Copy link
Contributor

cccntu commented Oct 27, 2020

if not isinstance(self.train_dataset, collections.abc.Sized):
return None

This used to be (before #7858)

if isinstance(self.train_dataset, torch.utils.data.IterableDataset):

I am using IterableDataset with len in Trainer. This change makes it return a sampler and results in an error later. ValueError: DataLoader with IterableDataset: expected unspecified sampler option, but got sampler=<torch.utils.data.sampler.RandomSampler object at 0x7fa32c57b340>

Maybe change to this?

if (isinstance(self.train_dataset, torch.utils.data.IterableDataset) or
        not isinstance(self.train_dataset, collections.abc.Sized)):

@j-rossi-nl @sgugger

@sgugger
Copy link
Collaborator

sgugger commented Oct 27, 2020

I'm sorry, but you could explain to me why an IterableDataset with a __len__ is not a regular Dataset?

@cccntu
Copy link
Contributor Author

cccntu commented Oct 27, 2020

In my case, I wrap a Dataset using a class that inherits IterableDataset, and defines a __len__().
The purpose is to implement smart batching[1]. I use IterableDataset so I can control how to iterate the data.

I don't know if it's possible/easier to Dataset+Sampler, if so please let me know.
Also note that (after the change) if I drop __len__() to suppress the bug, I would then need to specify max_iter (or something like that), which is inconvenient.
[1] (https://wandb.ai/pommedeterresautee/speed_training/reports/Train-HuggingFace-Models-Twice-As-Fast--VmlldzoxMDgzOTI)

@sgugger
Copy link
Collaborator

sgugger commented Oct 27, 2020

It does seem a bit hacky but I guess we can add that test. Do you want to suggest a PR with the change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants