-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: pub use ByteView
in arrow and improve documentation
#6275
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,7 +52,7 @@ use super::ByteArrayType; | |
/// not by value. as there are many different buffer layouts to represent the | ||
/// same data (e.g. different offsets, different buffer sizes, etc). | ||
/// | ||
/// # Layout | ||
/// # Layout: "views" and buffers | ||
/// | ||
/// A `GenericByteViewArray` stores variable length byte strings. An array of | ||
/// `N` elements is stored as `N` fixed length "views" and a variable number | ||
|
@@ -75,10 +75,12 @@ use super::ByteArrayType; | |
/// 0 31 63 95 127 | ||
/// ``` | ||
/// | ||
/// * Strings with length <= 12 are stored directly in the view. | ||
/// * Strings with length <= 12 are stored directly in the view. See | ||
/// [`Self::inline_value`] to access the inlined prefix from a short view. | ||
/// | ||
/// * Strings with length > 12: The first four bytes are stored inline in the | ||
/// view and the entire string is stored in one of the buffers. | ||
/// view and the entire string is stored in one of the buffers. See [`ByteView`] | ||
/// to access the fields of the these views. | ||
/// | ||
/// Unlike [`GenericByteArray`], there are no constraints on the offsets other | ||
/// than they must point into a valid buffer. However, they can be out of order, | ||
|
@@ -89,6 +91,8 @@ use super::ByteArrayType; | |
/// separate buffer while the string "LavaMonster" is stored inlined in the | ||
/// view. In this case, the same bytes for "Fish" are used to store both strings. | ||
/// | ||
/// [`ByteView`]: arrow_data::ByteView | ||
/// | ||
/// ```text | ||
/// ┌───┐ | ||
/// ┌──────┬──────┬──────┬──────┐ offset │...│ | ||
|
@@ -262,6 +266,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> { | |
} | ||
|
||
/// Returns the element at index `i` | ||
/// | ||
/// # Safety | ||
/// Caller is responsible for ensuring that the index is within the bounds of the array | ||
pub unsafe fn value_unchecked(&self, idx: usize) -> &T::Native { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It appears that the unsafe code in GenericByteViewArray is to avoid bounds checking. Is this correct? (I ask because it percolates into the unsafe code in this datafusion pr.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is correct. I pushed another commit to clarify the docs |
||
|
@@ -289,7 +294,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> { | |
std::slice::from_raw_parts((view as *const u128 as *const u8).wrapping_add(4), len) | ||
} | ||
|
||
/// constructs a new iterator | ||
/// Constructs a new iterator for iterating over the values of this array | ||
pub fn iter(&self) -> ArrayIter<&Self> { | ||
ArrayIter::new(self) | ||
} | ||
|
@@ -358,7 +363,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> { | |
builder.finish() | ||
} | ||
|
||
/// Comparing two [`GenericByteViewArray`] at index `left_idx` and `right_idx` | ||
/// Compare two [`GenericByteViewArray`] at index `left_idx` and `right_idx` | ||
/// | ||
/// Comparing two ByteView types are non-trivial. | ||
/// It takes a bit of patience to understand why we don't just compare two &[u8] directly. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,7 +25,7 @@ pub use arrow_array::cast::*; | |
pub use arrow_array::iterator::*; | ||
pub use arrow_array::*; | ||
pub use arrow_data::{ | ||
layout, ArrayData, ArrayDataBuilder, ArrayDataRef, BufferSpec, DataTypeLayout, | ||
layout, ArrayData, ArrayDataBuilder, ArrayDataRef, BufferSpec, ByteView, DataTypeLayout, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is the only code change in this PR |
||
}; | ||
|
||
pub use arrow_data::transform::{Capacities, MutableArrayData}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will help to add pointers to common functions to manipulate the views