-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create separate compression-specific layer to enable writing gzipped files #91
Comments
Is anybody working on this? If not, I'll have a look at it. |
@mpenkov You will be very welcome! |
@tmylk @piskvorky I've started working on this on a separate branch. Basically, I rewrote the S3 subsystem using a hierarchy of classes based on the native io library. The S3 subsystem now returns file-like objects that can be passed to other decoders, etc. The existing tests, as well as ones I added, pass; things seem to work well. Almost. gzip (using the native library) doesn't work with Python 2. The 2.x implementation tries to seek around the file, and AFAIK that just isn't possible with S3 (no random access). This is a real shame, since plugging the gzip library in is a one-liner. The alternative is to write a separate decoder using the lower-level zlib, which we could use for Python 2.x only, since gzip is much more powerful. Please let me know what you think. |
👍 This issue is tagged as "easy". Is that true? Is the @mpenkov branch the best place to start on this? |
It's reasonably easy. I don't know how far behind master my branch is, but I imagine it'd be a good place to start. Working around the gzip issue is pretty much the only thing what's left, if I recall correctly @bgreen-litl |
just a nudge, I was bitten by this. It was trivial in my case to just modify the filename so that .gz wasn't at the end. That's good enough for my needs. But this remains an unresolved wart on the smart_open api. |
I wonder if this could be used as a backend to the FUSE filesystem library, so that you can "just" mount a smart_open drive as an actual block device (given you have the permission, of course). Bonus: This would allow any C++ lib to use it as well. (btw, i've implemented multi-user encrypted filesystems w/ FUSE, so I know it can be done:) |
@menshikh-iv Are you actively working on this? I've scheduled some time off in August and may have time to look into it. |
@mpenkov you are welcome, feel free to contribute |
@menshikh-iv I've had a look at it. The problem can be summarized as:
The above means that we should continue to use the GzipStreamFile instead of gzip.GzipFile. I'm not sure how well this fits into the design that I proposed at the start of this issue - I need to think about it. In the meanwhile, can someone please comment on the logic of the above? Is it really impossible to backport Python 3's gzip and bundle it with smart_open? |
@menshikh-iv Sorry, I linked the wrong issue, the blocker is #43. AFAICT, the reason for blocking is that boto3 may not be entirely backwards-compatible with boto, although that was brought up a while ago and the situation may have changed already. |
boto3 is absolutely not 'backwards compatible' with boto, their APIs are
substantially different. boto3 however can be used in parallel with boto,
with the intention of gradually shifting all boto calls to boto3
equivalents.
…On Mon, Aug 14, 2017 at 9:52 PM, Michael Penkov ***@***.***> wrote:
@menshikh-iv <https://github.com/menshikh-iv> Sorry, I linked the wrong
issue, the blocker is #43
<#43>.
AFAICT, the reason for blocking is that boto3 may not be *entirely*
backwards-compatible with boto, although that was brought up a while ago
<#43 (comment)>
and the situation may have changed already.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHpkgz7LG2ooyJGRApMU7TzaEWIGxQoks5sYEL1gaJpZM4KOcF4>
.
--
--
Mike Cariaso
http://www.cariaso.com
|
@menshikh-iv OK. I will go ahead and bring in boto3 to implement S3 seeking. |
Good luck @mpenkov👍 |
Implement the solution described by mpenkov in #82 (comment)
The text was updated successfully, but these errors were encountered: