You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed some naming inconsistencies across the torchvision datasets when it comes to specifying how to split the dataset (train/val/test). We currently have:
The rest are unspecified - but you can effectively choose the split in them by choosing the root folder (e.g. for COCO).
Is there a reason for different naming conventions for each? If not, is there a case for standardising the argument name to one of the above so it's consistent?
The text was updated successfully, but these errors were encountered:
Those inconsistencies are mostly because we didn't impose any particular structure to the datasets that were added to torchvision.
While this makes it very simple to understand what is going on, it also leads to those inconsistencies.
The split argument is only one of them, but we also have datasets that store a classes attribute, etc.
I think it might be worth think about standardization, but I'm less clear on how it should be structured, as each dataset is slightly different, so a single API might not be enough, even if they are similar.
One initial thought I had was to have a ClassificationDataset, see my comment in #1025
@fmassa Why would this be specific to a ClassificationDataset? Assuming that it is, I can further think of the classes and class_to_idx parameters that should be included. If we want a ClassificationDataset I would like to take that up.
This is not specific to a ClassificationDataset, but enters in the same bucket of standardization that I mentioned wrt ClassificationDataset.
@pmeier can you open an issue describing a proposed design for the ClassificationDataset, and we can iterate over it? No need to implement anything, just describe what would be inside it, and what datasets would fit into this abstraction.
I've noticed some naming inconsistencies across the torchvision datasets when it comes to specifying how to split the dataset (train/val/test). We currently have:
classes
: LSUNimage_set
: SBD, VOCsplit
: CelebA, CityScapes, ImageNet, STL10, SVHNtrain
: CIFAR10, MNIST, USPSThe rest are unspecified - but you can effectively choose the split in them by choosing the root folder (e.g. for COCO).
Is there a reason for different naming conventions for each? If not, is there a case for standardising the argument name to one of the above so it's consistent?
The text was updated successfully, but these errors were encountered: