Skip to content

[Feature Implement] ZipFolder (TarFolder) #3519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ain-soph opened this issue Mar 7, 2021 · 1 comment
Closed

[Feature Implement] ZipFolder (TarFolder) #3519

ain-soph opened this issue Mar 7, 2021 · 1 comment

Comments

@ain-soph
Copy link
Contributor

ain-soph commented Mar 7, 2021

🚀 Feature

This issue is corresponding to my PR #3510 .

Implement a ZipFolder class, which is based on my previous PR #3215 .
The idea is very similar to the TarDataset issue on pytorch/pytorch#49440.
It archives the ImageFolder to be a zip without any compression. The functions are almost the same as ImageFolder.

Advantage: it's better for long term use with one single archive file, and makes loading and transferring faster and more convenient by avoiding small files IO (when memory=True), especially on HDD disk.
When argument memory is set to be true, it'll read all bytes of the zip into memory at beginning. Otherwise, the default loading by zipfile would be lazy, leading to the same mechanism as ImageFolder.

Besides the basic utility, I also add a staticmethod initialize_from_folder that makes a folder (follows the ImageFolder requirements) to be the zip format.


Need Discussion:

  1. Method initialize_from_folder might need a better name. (Candidates: init_from_folder, folder_to_zip)
  2. It might not be appropriate to use io.BytesIO for type annotation.
  3. Potential file structure of zip file (zip filename == [root_folder_name]_store.zip):
    a. (current) [root_folder_name]/[target_class]/[img_file]
    b. [target_class]/[img_file]
  4. We need to check the compress type to be ZIP_STORED.

And unit test and docs need doing if any reviewer thinks this PR worth it.

cc @pmeier

@ain-soph
Copy link
Contributor Author

ain-soph commented Mar 7, 2021

I think there is some current work in data pipes:
zip
tar

If that's the case, maybe they could be united into ImageFolder with different pipe options.

@ain-soph ain-soph changed the title [Feature Request] ZipFolder (TarFolder) [Feature Implement] ZipFolder (TarFolder) Mar 7, 2021
@ain-soph ain-soph closed this as completed Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants