Skip to content

'make_dataset' as staticmethod of 'DatasetFolder' #3215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 15, 2021
Merged

'make_dataset' as staticmethod of 'DatasetFolder' #3215

merged 3 commits into from
Jan 15, 2021

Conversation

ain-soph
Copy link
Contributor

@ain-soph ain-soph commented Dec 30, 2020

Make DatasetFolder more generalized so that subclasses could inherit DatasetFolder rather than VisionDataset.
Example: ZippedImageFolder #950
https://github.com/koenvandesande/vision/blob/6247d96d3ce658d553e4e2ceea5cedd77f715b29/torchvision/datasets/zippedfolder.py#L12-L123

If the pull request is approved. I can write it in a more convenient way without re-implement DatasetFolder utilities.
See my implementation at ZipFolder Issue

@pmeier
Copy link
Collaborator

pmeier commented Jan 4, 2021

Hey @ain-soph and thanks for the PR! I'm not sure I understand the purpose of this change. Could you elaborate this a bit?

  • Are you providing this as method to be able to overwrite it in a subclass?
  • Why is it a staticmethod?

@ain-soph
Copy link
Contributor Author

ain-soph commented Jan 5, 2021

@pmeier
Thanks for your quick response!

Are you providing this as method to be able to overwrite it in a subclass?

Yes, exactly! I wanna read data from a zip file (which is not compressed), and it is a file whose contents follow ImageFolder structure requirements, however, the DatasetFolder requires it to be a real folder on disk since the make_dataset uses os.path.walk. I hope to subclass DatasetFolder but be able to write my own make_dataset method. If I have to subset the raw VisionDataset, I have to repeat many things in DatasetFolder, just like #950 did.

Well, actually this ZipFolder doesn't have too much advantage and can't avoid reading small files issues for HDD disk. It's just an alternative option, and we can keep a large zip file on the disk rather than a folder with many small image files. Something like use ISO file for things that don't change. Maybe better for archive and long-term usage?

It's just a use case, and I think DatasetFolder could just provide more flexibility without sacrificing anything.

Why is it a staticmethod?

I see that the original method of make_dataset is not in the class. I don't want to change this pattern, and try to make the minimum modification. It's okay to be a classmethod or a normal method for objects.

@pmeier
Copy link
Collaborator

pmeier commented Jan 5, 2021

It's just a use case, and I think DatasetFolder could just provide more flexibility without sacrificing anything.

IMO that is reasonable. Still, lets hear what @fmassa has to say about this.

I see that the original method of make_dataset is not in the class. I don't want to change this pattern, and try to make the minimum modification. It's okay to be a classmethod or a normal method for objects.

GIven that in the future other implementations might use some object attributes or methods, I would simply define it as standard method.

@ain-soph
Copy link
Contributor Author

Any update?

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I had in mind for a while already, thanks for doing this!

@fmassa fmassa merged commit 3b19d6f into pytorch:master Jan 15, 2021
facebook-github-bot pushed a commit that referenced this pull request Jan 21, 2021
Summary:
* 'make_dataset' as staticmethod of 'DatasetFolder'

* a better fix

Reviewed By: datumbox

Differential Revision: D25954567

fbshipit-source-id: 514fde3bad4e27518a198276228a36c3217c2163

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants