-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out arrow-data
into a separate crate
#2746
Conversation
@@ -284,7 +284,11 @@ impl<T: DecimalType> DecimalArray<T> { | |||
|
|||
// safety: self.data is valid DataType::Decimal as checked above | |||
let new_data_type = Self::TYPE_CONSTRUCTOR(precision, scale); | |||
Ok(self.data().clone().with_data_type(new_data_type).into()) | |||
let data = self.data().clone().into_builder().data_type(new_data_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was previously pub(crate), I opted to just remove it in favour of the into_builder plumbing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @liukun4515 -- I think it is fine
@@ -56,6 +56,10 @@ impl Bitmap { | |||
unsafe { bit_util::get_bit_raw(self.bits.as_ptr(), i) } | |||
} | |||
|
|||
pub fn buffer(&self) -> &Buffer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend to deprecate buffer_ref in a follow up PR
c5b6ef5
to
2680c16
Compare
@@ -0,0 +1,672 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved from array/transform/mod.rs but with the tests split out into arrow/tests/array_transform.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think moving out the tests is a great idea -- I wonder if we need to do anything more to make them discoverable 🤔
@@ -358,6 +358,90 @@ pub trait ArrayAccessor: Array { | |||
unsafe fn value_unchecked(&self, index: usize) -> Self::Item; | |||
} | |||
|
|||
impl PartialEq for dyn Array { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are moved from arrow/equal/mod.rs
@@ -1,1464 +0,0 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved to arrow-data/equal/mod.rs with the array equality moved to arrow/array/array.rs and the tests moved to arrow/tests/array_equal.rs
@@ -1771,1124 +1746,30 @@ mod tests { | |||
} | |||
|
|||
#[test] | |||
#[should_panic( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are moved to arrow/tests/array_validation.rs
@@ -88,30 +88,3 @@ pub(super) fn boolean_equal( | |||
}) | |||
} | |||
} | |||
|
|||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are moved to arrow/tests/array_equal.rs
@@ -149,52 +148,3 @@ pub(super) fn list_equal<T: OffsetSizeTrait>( | |||
}) | |||
} | |||
} | |||
|
|||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are moved to arrow/tests/array_equal.rs
@@ -0,0 +1,171 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved from arrow/array/equal/mod.rs
arrow-data
into a new cratearrow-data
into a separate crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I reviewed every single line in this PR super carefully, but I did read them all and looked at the changes, and verified the tests are still present.
I also like the way that several of the larger tests are moved into arrow/tests/...
integration style tests. This is similar to what arrow2 does 👍 Nice
Do we think it is appropriate to run any benchmarks?
LGTM I say we ship it
@@ -0,0 +1,672 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think moving out the tests is a great idea -- I wonder if we need to do anything more to make them discoverable 🤔
buffer: &mut MutableBuffer, | ||
offsets: &[T], | ||
values: &[u8], | ||
start: usize, | ||
len: usize, | ||
) { | ||
let start_values = offsets[start].to_usize().unwrap(); | ||
let end_values = offsets[start + len].to_usize().unwrap(); | ||
let start_values = offsets[start].as_(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL: https://docs.rs/num/latest/num/cast/trait.AsPrimitive.html#tymethod.as_
as_()
is something I have not seen before. Fascinating.
@@ -284,7 +284,11 @@ impl<T: DecimalType> DecimalArray<T> { | |||
|
|||
// safety: self.data is valid DataType::Decimal as checked above | |||
let new_data_type = Self::TYPE_CONSTRUCTOR(precision, scale); | |||
Ok(self.data().clone().with_data_type(new_data_type).into()) | |||
let data = self.data().clone().into_builder().data_type(new_data_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @liukun4515 -- I think it is fine
This does not appear to have a discernible impact on the benchmarks - some fluctuate as is to be expected when you're talking nanoseconds, but no major change in either direction |
Benchmark runs are scheduled for baseline = 80c0f1a and contender = a927d7a. a927d7a is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #2594
Rationale for this change
ArrayData represents the canonical representation of an arrow array, and is the language of the C Data interface. It therefore makes sense to split into its own crate. In theory this could form the basis for interoperability between different Rust array abstractions.
What changes are included in this PR?
Splits out ArrayData and related structures into a separate crate.
Are there any user-facing changes?
We can no longer implement
From<ArrayRef> for ArrayData
. In practice this only appears to be being used by pyarrow code, and can be replaced with calls tomake_array
orArray::data()
as appropriate.