-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add test if dataset samples can be collated #5233
base: main
Are you sure you want to change the base?
Conversation
💊 CI failures summary and remediationsAs of commit 18799db (more details on the Dr. CI page):
🕵️ 9 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: unittest_windows_cpu_py3.8 (1/9)Step: "Run tests" (full log | diagnosis details | 🔁 rerun)
|
Job | Step | Action |
---|---|---|
unittest_macos_cpu_py3.8 | Run tests | 🔁 rerun |
🚧 1 ongoing upstream failure:
These were probably caused by upstream breakages that are not fixed yet.
- unittest_prototype since Jan 14 (adf8466)
This comment was automatically generated by Dr. CI (expand for details).
Please report bugs/suggestions to the (internal) Dr. CI Users group.
Summing up what we discussed with Philip offline: The issue here is that some of our datasets (those above) can have The default collate function of DataLoader cannot handle dataset = datasets.CLEVRClassification("~/datasets", split="test", transform=transforms.PILToTensor())
data_loader = DataLoader(dataset, batch_size=3)
A decent workaround for these right now is to write a custom collate function that doesn't try to wrap def collate_fn(batch):
imgs = [img for (img, _) in batch]
return torch.stack(imgs), [None] * len(batch) We'll reach out to the torchdata team to see what they think of this. Perhaps the default collate function could start accepting |
QQ: If there is no target in test set, why not emit target and only keep the |
It's because we want to keep the returned types / length consistent across splits. Otherwise users need to unpack the returned values and check whether there is a target or not, which can be a cumbersome |
Makes sense. And, what would we expect in the batch before collate if target is
And, we need a custom map (collate) function for either case. def collate_fn(batch):
imgs, targets = tuple(zip(*batch))
return default_collate(imgs), None Case 2: def collate_fn(batch):
return default_collate(batch), None I'm using |
That is the root of the problem: we cannot say past |
I've summarized my proposal here. Please have a look @NicolasHug, @ejguan, @NivekT, @VitalyFedyunin |
The following datasets currently fail the test:
WIDERFace
Kitti
Sintel
KittiFlow
HD1K
FER2013
CLEVRClassification
OxfordIIITPet
They all fail since they return
None
in at least one configuration anddefault_collate
does not support it. This in turn means that the datasets with the offending configurations cannot be used in atorch.utils.data.DataLoader(dataset)
without passing a customcollate_fn
.cc @pmeier