You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have problems getting consistent behaviours when using get_file for different filesystems when using the compression parameter. My understanding from the AbstractFilesystem implementation of that method is that kwargs should be used by the open method, but for some filesystems it fails silently.
My goal was to fetch files and decompress them on the fly: maybe there is a better suited function for this?
Minimal example:
importfsspecimportbz2fromzipfileimportZipFile# create data filefilename="/tmp/important_data.txt.bz2"data=b"very important data."withopen(filename, "wb") asfd:
fd.write(bz2.compress(data))
# open with compressionprint(fsspec.open(filename, compression="infer").open().read())
# prints "b'very important data.'"# fetch from local filesystemfsspec.filesystem("file").get_file(filename, "/tmp/new", compression="infer")
print(open("/tmp/new", "rb").read())
# prints "b'BZh91AY&SY\x85\xf4|P\x00\x00\t\x11\x80@\x01&#\xd5 \x00"\x9e\x93i\x06\xca\x10\x00\x02\xdc\xc6\x0c\xb1\xc2\xbc\xad\x16\xc7\xc5\xdc\x91N\x14$!}\x1f\x14\x00'"# fetch from ssh filesystemfsspec.filesystem("ssh", host="localhost").get_file(filename, "/tmp/new", compression="infer")
print(open("/tmp/new", "rb").read())
# prints "b'BZh91AY&SY\x85\xf4|P\x00\x00\t\x11\x80@\x01&#\xd5 \x00"\x9e\x93i\x06\xca\x10\x00\x02\xdc\xc6\x0c\xb1\xc2\xbc\xad\x16\xc7\xc5\xdc\x91N\x14$!}\x1f\x14\x00'"# fetch from zip filesystemzfile=filename+".zip"withZipFile(zfile, 'w') aszipf:
zipf.write(filename)
of=fsspec.open("zip://"+filename+"::file://"+zfile)
of.fs.get_file(filename, "/tmp/new", compression="infer")
print(open("/tmp/new", "rb").read())
# prints "b'very important data.'"
The text was updated successfully, but these errors were encountered:
The fallback implementation of get_file is via open(), so extra kwargs like compression get passed down. However, many filesystem backends have more specialised get_file methods, to allow better operation like parallel downloading. In such cases, we are not necessarily streaming the bytes, and so on-the-fly decompression would not be possible anyway.
I think we should say, that only open() is guaranteed to layer file-like objects for decompression or text mode.
@martindurant thanks for the clarification! I understand I will have to implement a custom solution for my use case.
But I think my point still stands, about the silent ignoring of the kwargs? Wouldn’t it be better to raise an error in such a case?
A general problem throughout the fsspec code, is that there are many places that kwargs can get passed to, including general purpose arguments to the third-party backend libraries. Therefore, most methods only extract the arguments they need and pass everything else along, and whether you get an exception or not, depends on how the third-party package is called and what it expects.
I have problems getting consistent behaviours when using
get_file
for different filesystems when using thecompression
parameter. My understanding from the AbstractFilesystem implementation of that method is that kwargs should be used by theopen
method, but for some filesystems it fails silently.My goal was to fetch files and decompress them on the fly: maybe there is a better suited function for this?
Minimal example:
The text was updated successfully, but these errors were encountered: