-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add interleave
kernel (#1523)
#2838
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @tustvold
I had some small tests / doc suggestions, but the code looks very nice 👌
use arrow_schema::ArrowError; | ||
|
||
/// | ||
/// Takes elements by index from a list of [`Array`], creating a new [`Array`] from those values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ for the ascii art (lol though I am biased)
cc @Dandandan and @yjshen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You created it for the original ticket 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, nothing like a little self praise of my monodraw skillz to lighten up the review process
/// values array 1 | ||
/// ``` | ||
/// | ||
/// For selecting values by index from a single array see [compute::take](crate::compute::take) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
/// | ||
/// For selecting values by index from a single array see [compute::take](crate::compute::take) | ||
/// | ||
/// # Panics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function appears to return an error (rather than panic
) in these two cases
"It is not possible to interleave arrays of different data types." | ||
.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"It is not possible to interleave arrays of different data types." | |
.to_string(), | |
format!("It is not possible to interleave arrays of different data types ({:?} and {:?})" | |
array.data_type(), data_type) |
return Ok(new_empty_array(data_type)); | ||
} | ||
|
||
// TODO: Add specialized implementations (#1523) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the specialized implementation should be tracked by a different ticket than the initial kernel, right?
} | ||
|
||
if indices.is_empty() { | ||
return Ok(new_empty_array(data_type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
fn test_primitive_empty() { | ||
let a = Int32Array::from_iter_values([1, 2, 3, 4]); | ||
let v = interleave(&[&a], &[]).unwrap(); | ||
assert!(v.is_empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert!(v.is_empty()); | |
assert!(v.is_empty()); | |
assert!(v.data_type(), DataType::Int32); |
let b = Int32Array::from_iter_values([5, 6, 7]); | ||
let c = Int32Array::from_iter_values([8, 9, 10]); | ||
let values = | ||
interleave(&[&a, &b, &c], &[(0, 3), (0, 3), (2, 2), (2, 0), (1, 1)]).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for checking repeated indexes (0,3)
and (0,3)
return Err(ArrowError::InvalidArgumentError( | ||
"interleave requires input of at least one array".to_string(), | ||
)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also return an error for single array case and suggest to use compute::take
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vacillated a bit on this, I think given the concat kernel which makes even less sense to be called on a single array, doesn't error in this case - I would be inclined to leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is nicer to support handling a single input (mostly as a convenience for downstream users so they don't have to special case their code for len() = 1
case)
Parquet failure is unrelated |
Benchmark runs are scheduled for baseline = 8adebca and contender = fa1d079. fa1d079 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #1523
Rationale for this change
When implementing sorts, joins, etc... it is fairly common to need a method to select values from multiple source arrays and interleave them into a single output array.
I opted to call this
interleave
instead oftake_multi
, etc... as it has a non-trivially different signature, I think serves a sufficiently different use-case, and it isn't immediately obvious that themulti
intake_multi
refers to multiple source arrays, and not something different.What changes are included in this PR?
Adds an interleave kernel that allows interleaving values from multiple source arrays. This is an MVP implementation, largely lifted from SortPreservingMerge in DataFusion. Future PRs will look to optimise it
Are there any user-facing changes?
No