Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Improve comments on GenericByteViewArray::bytes_iter(), prefix_iter() and suffix_iter() #6306

Merged
merged 1 commit into from
Aug 28, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions arrow-array/src/array/byte_view_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
ArrayIter::new(self)
}

/// Returns an iterator over the bytes of this array.
/// Returns an iterator over the bytes of this array, including null values
pub fn bytes_iter(&self) -> impl Iterator<Item = &[u8]> {
self.views.iter().map(move |v| {
let len = *v as u32;
Expand All @@ -317,8 +317,11 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
})
}

/// Returns an iterator over the prefix bytes of this array with respect to the prefix length.
/// If the prefix length is larger than the string length, it will return the empty slice.
/// Returns an iterator over the first `prefix_len` bytes of each array
/// element, including null values.
///
/// If `prefix_len` is larger than the element's length, the iterator will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels rather surprising to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a statement or a question?

If you have a suggestion of how to make it less surprising perhaps @xinlifoobar would be willing to give it a try

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would have thought returning Option<&[u8]> would be perhaps more intuitive and likely perform similarly? That being said I am surprised that skipping the null mask yields performance returns, as this hasn't been my experience with regular StringArray, so my intuition about performance may be off

Copy link
Contributor

@xinlifoobar xinlifoobar Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions! @tustvold.

I created a new PR #6312 to put this discussion. I did this in the early version but didn't see performance gains like this time - it did well for inline values. We might do a combination based on the bench result, e.g., null masks for prefix_iter and slice for suffix_iter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results on #6312 seem to show that using null masks does indeed reduce performance

So is the conclusion that these APIs are good to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, I think this API would be better if it returned None for a prefix too long, even if it continues to not inspect null masks

/// return an empty slice (`&[]`).
pub fn prefix_bytes_iter(&self, prefix_len: usize) -> impl Iterator<Item = &[u8]> {
self.views().into_iter().map(move |v| {
let len = (*v as u32) as usize;
Expand All @@ -341,8 +344,14 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
})
}

/// Returns an iterator over the suffix bytes of this array with respect to the suffix length.
/// If the suffix length is larger than the string length, it will return the empty slice.
/// Returns an iterator over the last `suffix_len` bytes of each array
/// element, including null values.
///
/// Note that for [`StringViewArray`] the last bytes may start in the middle
/// of a UTF-8 codepoint, and thus may not be a valid `&str`.
///
/// If `suffix_len` is larger than the element's length, the iterator will
/// return an empty slice (`&[]`).
pub fn suffix_bytes_iter(&self, suffix_len: usize) -> impl Iterator<Item = &[u8]> {
self.views().into_iter().map(move |v| {
let len = (*v as u32) as usize;
Expand Down
Loading