-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AsyncPutWriter (#7991) #7992
Conversation
Still, it would be good to ensure this method works as intended or is removed if we don't see a use case for it. |
The work on these abstractions was originally completed in #6526. It may be worth revisiting some of those original design discussions now that we have more write related features implemented. |
Fair enough, I think the justification would be if you know that the file is going to be less than the 10MB minimum part size, a multipart upload is not going to do anything but add request and buffer copying overheads. That being said it might still help for the local filesystem use-case... Ultimately I don't have a strong opinion I was just surprised to stumble across this as I couldn't see how it could work 😅 |
Just from a cursory perusal of the code, a lot of the design complexity appears to arise from the append use-case, including things like whether to write headers or not. This also causes pain on the read side. I wonder if we could better encapsulate this streaming use-case, potentially in its own operators, so as to simplify the general design |
Yeah agreed. My thinking is that if you are writing a few <10MB file performance is almost certainly not an issue. If you are writing dozens or 100s of <=10MB files, you probably just shouldn't do that and instead write a few files in the 256MB-2GB range with multipart put.
Yes, agreed. There are currently a few hacky workarounds to allow the streaming and appending code to coexist with some of the parallelization code (such as here). |
I've filed #7994 to track splitting out the streaming use-case, PTAL |
Closing in favor of #8017 |
Which issue does this PR close?
Closes #7991
Rationale for this change
@devinjdangelo perhaps you might be able to help out testing this, I'm honestly a little unclear on how this functionality is wired up / tested. I purely noticed it whilst trying out an object_store upgrade and touching the code
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?