-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out arrow-array
crate (#2594)
#2769
Conversation
@@ -166,19 +166,6 @@ impl<T: PyArrowConvert> PyArrowConvert for Vec<T> { | |||
} | |||
} | |||
|
|||
impl<T> PyArrowConvert for T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be implemented as it errors complaining that arrow_schema::DataType
could be updated to implement Array + From<ArrayData>
which would then cause a conflict with the impl PyArrowConvert for DataType
.
Ultimately this impl is not hugely important, as it is just a case of using make_array
and Array::data
marking as api-change due to removal of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty epic PR -- I went through it fairly carefully and it looks great to me
/// assert_eq!(array.keys(), &Int8Array::from(vec![0, 0, 1, 2])); | ||
/// assert_eq!(array.values(), &values); | ||
/// ``` | ||
pub type Int8DictionaryArray = DictionaryArray<Int8Type>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these pub type
s are new, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use std::any::Any; | ||
|
||
/// | ||
/// # Example: Using `collect` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 for adding basic doc examples to these typedefs
assert!(!as_decimal_array(&array).is_empty()); | ||
let result_decimal = as_decimal_array(&array); | ||
assert_eq!(result_decimal, &array); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
use crate::builder::*; | ||
|
||
#[test] | ||
fn test_buffer_builder_availability() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is the kind of thing that should be in in a tests
type integration test to ensure that the types are pub
and not pub(crate)
for example
fn schema(&self) -> SchemaRef; | ||
|
||
/// Reads the next `RecordBatch`. | ||
#[deprecated( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 This is another breaking API change (nice cleanup
@@ -51,7 +51,7 @@ fn double(array: &PyAny, py: Python) -> PyResult<PyObject> { | |||
let array = kernels::arithmetic::add(array, array).map_err(to_py_err)?; | |||
|
|||
// export | |||
array.to_pyarrow(py) | |||
array.data().to_pyarrow(py) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a very reasonable change
(but is it also an API change?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry I made this change after I wrote the PR description
fn schema(&self) -> SchemaRef; | ||
|
||
/// Reads the next `RecordBatch`. | ||
#[deprecated( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, whoops -- maybe we should remove this deprecated API (perhaps as a follow on PR)
FWIW it occurs to me we probably need to update the github workflow triggers to reflect this new code organization: For example: SHould probably include |
Running the benchmarks, some of the faster benchmarks do show the odd ~10% regression, but we're talking 10s of microseconds here. I'm inclined to think this is not an issue, and if it transpires to be so, we can revisit those kernels. |
Benchmark runs are scheduled for baseline = 6bee576 and contender = 06c204c. 06c204c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
arrow-array
cratearrow-array
crate (#2594)
Draft as I wish to perform another pass, and double-check the benchmarksWhich issue does this PR close?
Part of #2594
Rationale for this change
Continues the process of splitting apart the crate, so that components can depend on just what they need, compilation parallelizes better, etc...
What changes are included in this PR?
Moves the array, array builders, and record batch definitions into a new arrow-array crate
Are there any user-facing changes?
The deprecated
RecordBatch::concat
is removed, otherwise there are no breaking changes 🎉