Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: pub use ByteView in arrow and improve documentation #6275

Merged
merged 3 commits into from
Aug 20, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions arrow-array/src/array/byte_view_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ use super::ByteArrayType;
/// not by value. as there are many different buffer layouts to represent the
/// same data (e.g. different offsets, different buffer sizes, etc).
///
/// # Layout
/// # Layout: "views" and buffers
///
/// A `GenericByteViewArray` stores variable length byte strings. An array of
/// `N` elements is stored as `N` fixed length "views" and a variable number
Expand All @@ -75,10 +75,12 @@ use super::ByteArrayType;
/// 0 31 63 95 127
/// ```
///
/// * Strings with length <= 12 are stored directly in the view.
/// * Strings with length <= 12 are stored directly in the view. See
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will help to add pointers to common functions to manipulate the views

/// [`Self::inline_value`] to access the inlined prefix from a short view.
///
/// * Strings with length > 12: The first four bytes are stored inline in the
/// view and the entire string is stored in one of the buffers.
/// view and the entire string is stored in one of the buffers. See [`ByteView`]
/// to access the fields of the these views.
///
/// Unlike [`GenericByteArray`], there are no constraints on the offsets other
/// than they must point into a valid buffer. However, they can be out of order,
Expand All @@ -89,6 +91,8 @@ use super::ByteArrayType;
/// separate buffer while the string "LavaMonster" is stored inlined in the
/// view. In this case, the same bytes for "Fish" are used to store both strings.
///
/// [`ByteView`]: arrow_data::ByteView
///
/// ```text
/// ┌───┐
/// ┌──────┬──────┬──────┬──────┐ offset │...│
Expand Down Expand Up @@ -262,6 +266,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
}

/// Returns the element at index `i`
///
/// # Safety
/// Caller is responsible for ensuring that the index is within the bounds of the array
pub unsafe fn value_unchecked(&self, idx: usize) -> &T::Native {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that the unsafe code in GenericByteViewArray is to avoid bounds checking. Is this correct?

(I ask because it percolates into the unsafe code in this datafusion pr.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. I pushed another commit to clarify the docs

Expand Down Expand Up @@ -289,7 +294,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
std::slice::from_raw_parts((view as *const u128 as *const u8).wrapping_add(4), len)
}

/// constructs a new iterator
/// Constructs a new iterator for iterating over the values of this array
pub fn iter(&self) -> ArrayIter<&Self> {
ArrayIter::new(self)
}
Expand Down Expand Up @@ -358,7 +363,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
builder.finish()
}

/// Comparing two [`GenericByteViewArray`] at index `left_idx` and `right_idx`
/// Compare two [`GenericByteViewArray`] at index `left_idx` and `right_idx`
///
/// Comparing two ByteView types are non-trivial.
/// It takes a bit of patience to understand why we don't just compare two &[u8] directly.
Expand Down
7 changes: 7 additions & 0 deletions arrow-data/src/byte_view.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@
use arrow_buffer::Buffer;
use arrow_schema::ArrowError;

/// Helper to access views of [`GenericByteViewArray`] (`StringViewArray` and
/// `BinaryViewArray`) where the length is greater than 12 bytes.
///
/// See the documentation on [`GenericByteViewArray`] for more information on
/// the layout of the views.
///
/// [`GenericByteViewArray`]: https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html
#[derive(Debug, Copy, Clone, Default)]
#[repr(C)]
pub struct ByteView {
Expand Down
2 changes: 1 addition & 1 deletion arrow/src/array/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ pub use arrow_array::cast::*;
pub use arrow_array::iterator::*;
pub use arrow_array::*;
pub use arrow_data::{
layout, ArrayData, ArrayDataBuilder, ArrayDataRef, BufferSpec, DataTypeLayout,
layout, ArrayData, ArrayDataBuilder, ArrayDataRef, BufferSpec, ByteView, DataTypeLayout,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only code change in this PR

};

pub use arrow_data::transform::{Capacities, MutableArrayData};
Expand Down
Loading