Skip to content

[Feature Implement] ZipFolder (TarFolder) #3519

@ain-soph

Description

@ain-soph

🚀 Feature

This issue is corresponding to my PR #3510 .

Implement a ZipFolder class, which is based on my previous PR #3215 .
The idea is very similar to the TarDataset issue on pytorch/pytorch#49440.
It archives the ImageFolder to be a zip without any compression. The functions are almost the same as ImageFolder.

Advantage: it's better for long term use with one single archive file, and makes loading and transferring faster and more convenient by avoiding small files IO (when memory=True), especially on HDD disk.
When argument memory is set to be true, it'll read all bytes of the zip into memory at beginning. Otherwise, the default loading by zipfile would be lazy, leading to the same mechanism as ImageFolder.

Besides the basic utility, I also add a staticmethod initialize_from_folder that makes a folder (follows the ImageFolder requirements) to be the zip format.


Need Discussion:

  1. Method initialize_from_folder might need a better name. (Candidates: init_from_folder, folder_to_zip)
  2. It might not be appropriate to use io.BytesIO for type annotation.
  3. Potential file structure of zip file (zip filename == [root_folder_name]_store.zip):
    a. (current) [root_folder_name]/[target_class]/[img_file]
    b. [target_class]/[img_file]
  4. We need to check the compress type to be ZIP_STORED.

And unit test and docs need doing if any reviewer thinks this PR worth it.

cc @pmeier

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions