-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating ZipStore with file-like object #1018
Comments
Hey @oeway. Thanks for the suggestion! In your mind is what we need the addition of an optional argument for |
How about we rename the first argument if isinstance(file_or_path, (str, os.PathLike)):
self.path = file_or_path
else: # file-like object
if hasattr(file_or_path, 'name'):
# Normal file object has the name property which contains the path
self.path = file_or_path.name
else:
self.path = "" # the default is empty? or None?
self.zf = zipfile.ZipFile(file_or_path, mode=mode, compression=compression,
allowZip64=allowZip64) |
I wouldn't suggest renaming since existing code could be using |
Good point! |
Think it would be ok to just accept We have similar flexibility in Dask. |
As a side note we might consider using position only and keyword only arguments in Zarr 3 to avoid these issues around renaming arguments. |
Issue created, @jakirkham.
So the logic would be:
Or would you just pass a ZipFile, @oeway ? |
Hi, for my case it's a fake file object making http requests to a file with range header, so not a zipfile. |
Thanks Josh! 😄 Intriguing would be curious to look at the object if you don't mind sharing 🙂 Does it support one of the file ABCs? Or could it? If so, then it would be pretty easy to check if it |
Sure, here it is: https://github.com/imjoy-team/imjoy-rpc/blob/af739ec829d984da35bc5b87b93aa1a553944fe3/python/imjoy_rpc/utils.py#L672-L842 It's a class inherit from |
cc @martindurant (in case this is of interest especially given our ZIP discussion recently) |
I would point out that fsspec supports a zarr-compatible key-value store over any fs it can instantiate, including ZIP and in-memory. You can pass these directly to zarr.
|
Obviously the thing above won't work in pyidide, since the HTTP part uses aiohttp - I should have said. But you can replace this part or wait until I write HTTP-for-pyodide (maybe next week?!). The separate issue I was talking with @jakirkham about, is that accessing this way uses a file - so access to zarr chunks would be always serial. With kerchunk, we can index the ZIP (i.e., translate the existing index embedded in the file) and attain concurrent access to zarr chunks. This is pretty simple, but not yet done, but should work in pyodide/pyscript too. |
I was just looking at the same problem. Creating from another I was actually looking at potentially contributing this but there are a number of places in |
FSStore already allows for this. In fsspec, you can pass a URL like "zip::s3://bucket/file.zip", or you can pass explicit arguments to the ZIP backend if you like. You don't need to rewrite ZipStore. |
Hi @martindurant, thanks so much for your response! I have been looking at the documentation for But perhaps I'm misunderstanding your something, in which case your feedback is much appreciated! Thank you. |
Good point. There's no principled reason that ZipFileSystem should be read-only, except that writing would be a terrible idea on a key-value storage (every update would need to rewrite the file). However, it'd work fine for local, in-memory or cached-to-remote files. |
You can try! --- a/fsspec/implementations/zip.py
+++ b/fsspec/implementations/zip.py
@@ -44,10 +44,8 @@ class ZipFileSystem(AbstractArchiveFileSystem):
a string.
"""
super().__init__(self, **kwargs)
- if mode != "r":
- raise ValueError("Only read from zip files accepted")
if isinstance(fo, str):
- files = open_files(fo, protocol=target_protocol, **(target_options or {}))
+ files = open_files(fo, mode=mode+"b", protocol=target_protocol, **(target_options or {}))
if len(files) != 1:
raise ValueError(
'Path "{}" did not resolve to exactly'
@@ -55,7 +53,7 @@ class ZipFileSystem(AbstractArchiveFileSystem):
)
fo = files[0]
self.fo = fo.__enter__() # the whole instance is a context
- self.zip = zipfile.ZipFile(self.fo)
+ self.zip = zipfile.ZipFile(self.fo, mode=mode)
self.block_size = block_size
self.dir_cache = None```
Should fail on any file object that doesn't allow seek while writing. |
Just a note, @oeway, think we could just test for |
Cross posting a related SO post and my answer there. https://stackoverflow.com/questions/74127357/how-to-create-and-return-a-zarr-file-from-xarray-dataset/74148410?noredirect=1#comment132344162_74148410 I wish I had seen this issue first 🤷. But in the end, I came up with a similar solution to @oeway. |
Hey @jhamman thanks for pointing me to this issue, here from SO 👋 In the time between the update, I tried the following and it seems to do the trick:
Although, I'm not 100% certain that this does not write to disk somewhere (I've had instances in the past where Note that |
The current ZipStore implementation assumes the input is always a path which becomes a bit limiting when dealing with in-memory file or virtual file object (e.g. in Pyodide in the browser). The virtual file object support is crucial for making zarr library useful in the browser environment, since the emscripten file system itself are rather limited at the moment.
It would be nice if we can allow passing a file-like object. I did a test with a small modification in the init function and it seems to be working nicely in Pyodide/JupyterLite:
I will maintain this piece of code somewhere for now, but it would be great if we can support this from upstream and eventually have it in pyodide.
The text was updated successfully, but these errors were encountered: