-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
🚀 Feature
This issue is corresponding to my PR #3510 .
Implement a ZipFolder class, which is based on my previous PR #3215 .
The idea is very similar to the TarDataset issue on pytorch/pytorch#49440.
It archives the ImageFolder to be a zip without any compression. The functions are almost the same as ImageFolder.
Advantage: it's better for long term use with one single archive file, and makes loading and transferring faster and more convenient by avoiding small files IO (when memory=True), especially on HDD disk.
When argument memory is set to be true, it'll read all bytes of the zip into memory at beginning. Otherwise, the default loading by zipfile would be lazy, leading to the same mechanism as ImageFolder.
Besides the basic utility, I also add a staticmethod initialize_from_folder that makes a folder (follows the ImageFolder requirements) to be the zip format.
Need Discussion:
- Method
initialize_from_foldermight need a better name. (Candidates:init_from_folder,folder_to_zip) - It might not be appropriate to use
io.BytesIOfor type annotation. - Potential file structure of zip file (zip filename ==
[root_folder_name]_store.zip):
a. (current)[root_folder_name]/[target_class]/[img_file]
b.[target_class]/[img_file] - We need to check the compress type to be ZIP_STORED.
And unit test and docs need doing if any reviewer thinks this PR worth it.
cc @pmeier