-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async writer tweaks #3967
Async writer tweaks #3967
Conversation
@ShiKaiWi perhaps you could take a look and let me know what you think |
props: Option<WriterProperties>, | ||
) -> Result<Self> { | ||
let shared_buffer = SharedBuffer::default(); | ||
let shared_buffer = SharedBuffer::new(buffer_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the major motivation for this PR, being able to avoid bump allocation where the Vec is repeatedly resized is important for performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in the #3957, buffer_flush_threshold
is designed to be able to be usize::MAX
in order to let the async writer not flush until all the encoding work is done. And for this reason, the buffer can't be pre-allocated at initialization.
And now I think it looks good here because of its efficiency, and it may be a fake feature to let the writer do flush only when all encoded bytes are ready. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fake feature to let the writer do flush only when all encoded bytes are read
Yeah, at that point you might as well just use the sync writer 😅
3a51a2e
to
8660dd4
Compare
|
||
Ok(metadata) | ||
} | ||
|
||
/// Flush the data in the [`SharedBuffer`] into the `async_writer` if its size | ||
/// exceeds the threshold. | ||
async fn try_flush( | ||
shared_buffer: &SharedBuffer, | ||
shared_buffer: &mut SharedBuffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mutable reference isn't technically required here, but acts as a lint that shared_buffer shouldn't be shared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we actually just remove the Mutex
entirely? Hold a Arc<SharedBuffer>
and use Arc::get_mut
to grab a mutable reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arc::get_mut
only works if there are no other Arc
references, which in this case wouldn't be the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right the async writer would also need a reference. I suppose you could hold an Arc<Vec<u8>>
in the the async writer and then have SharedBuffer
hold a Weak<Vec<u8>>
. Not sure that would end up pencilling out just to remove an uncontended mutex lock though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of try_lock here boils down to much the same thing - https://docs.rs/futures-util/0.3.27/src/futures_util/lock/mutex.rs.html#103
This PR looks fairly pretty for me. Learns a lot from it. |
Which issue does this PR close?
Closes #.
Rationale for this change
More consistent buffer handling
What changes are included in this PR?
This makes two relatively minor tweaks to the AsyncArrowWriter added in #3957 by @ShiKaiWi:
SharedBuffer
Are there any user-facing changes?
No changes to released APIs