Read COCO dataset from ZIP file? #947

koenvandesande · 2019-05-23T08:01:30Z

For large datasets on e.g. university clusters, where your data storage is an NFS mount, reading individual files can be slow. It also doesn't support reading ahead. In the cloud, you typically have SSD storage, but unzipping the dataset still takes time.

Would you be open to receiving a pull request that reads the COCO dataset from its zipped version? It adds around 10 lines in the COCO Detection class, and adds another Python file for reading ZIP files in a fork-safe manner (so it works with distributed training).

fmassa · 2019-05-23T09:44:29Z

You mean that all the images are in a zip file?
And how would the structure of the reading be? Does it unzip it all locally, or read the zipped file without uncompressing it all?

In general, I don't see why this would be something specific to the COCO dataset. But finding a generic way of supporting this for all datasets is something that would be great to have.

koenvandesande · 2019-05-23T09:55:34Z

Yes, all the images are in a zip file and they are read without unzipping. With the constraint (added by me) that the ZIP file shouldn't use compression (which is the case for COCO).
Note that ZIP files are suited for this because they have an index. For tar files, it isn't very efficient because you need to walk over the entire file first to build an index.
I'll first create something just for COCO, and then we can look at which other datasets are stored as ZIP files.

koenvandesande · 2019-05-23T10:30:16Z

This could easily apply to the following datasets as well (because they are stored as ZIP files):

celeba
omniglot
phototour (though not really, because it does postprocessing on the files after extraction)

fmassa added the module: datasets label May 23, 2019

koenvandesande mentioned this issue May 23, 2019

Read COCO dataset from ZIP file #950

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read COCO dataset from ZIP file? #947

Read COCO dataset from ZIP file? #947

koenvandesande commented May 23, 2019

fmassa commented May 23, 2019

Uh oh!

koenvandesande commented May 23, 2019

Uh oh!

koenvandesande commented May 23, 2019

Uh oh!

Read COCO dataset from ZIP file? #947

Read COCO dataset from ZIP file? #947

Comments

koenvandesande commented May 23, 2019

fmassa commented May 23, 2019

Uh oh!

koenvandesande commented May 23, 2019

Uh oh!

koenvandesande commented May 23, 2019

Uh oh!