Skip to content

Standardisation of Dataset API split argument name #1067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RJT1990 opened this issue Jun 29, 2019 · 3 comments
Open

Standardisation of Dataset API split argument name #1067

RJT1990 opened this issue Jun 29, 2019 · 3 comments

Comments

@RJT1990
Copy link
Contributor

RJT1990 commented Jun 29, 2019

I've noticed some naming inconsistencies across the torchvision datasets when it comes to specifying how to split the dataset (train/val/test). We currently have:

The rest are unspecified - but you can effectively choose the split in them by choosing the root folder (e.g. for COCO).

Is there a reason for different naming conventions for each? If not, is there a case for standardising the argument name to one of the above so it's consistent?

@fmassa
Copy link
Member

fmassa commented Jul 2, 2019

Hi,

Great question!

Those inconsistencies are mostly because we didn't impose any particular structure to the datasets that were added to torchvision.

While this makes it very simple to understand what is going on, it also leads to those inconsistencies.
The split argument is only one of them, but we also have datasets that store a classes attribute, etc.

I think it might be worth think about standardization, but I'm less clear on how it should be structured, as each dataset is slightly different, so a single API might not be enough, even if they are similar.

One initial thought I had was to have a ClassificationDataset, see my comment in #1025

Thoughts?

@pmeier
Copy link
Collaborator

pmeier commented Jul 3, 2019

@fmassa Why would this be specific to a ClassificationDataset? Assuming that it is, I can further think of the classes and class_to_idx parameters that should be included. If we want a ClassificationDataset I would like to take that up.

@fmassa
Copy link
Member

fmassa commented Jul 3, 2019

This is not specific to a ClassificationDataset, but enters in the same bucket of standardization that I mentioned wrt ClassificationDataset.

@pmeier can you open an issue describing a proposed design for the ClassificationDataset, and we can iterate over it? No need to implement anything, just describe what would be inside it, and what datasets would fit into this abstraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants