Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup uses of Array::data_ref (#3880) #3918

Merged
merged 2 commits into from
Mar 23, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Part of #3880

Rationale for this change

The first of likely multiple PRs to gradually remove the direct usage of ArrayData::data_ref from the kernels

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels Mar 23, 2023
@tustvold tustvold force-pushed the cleanup-uses-of-array-data branch from 13b9684 to 1ddbb77 Compare March 23, 2023 16:48
@@ -1159,8 +1159,13 @@ pub struct LexicographicalComparator<'a> {
impl LexicographicalComparator<'_> {
/// lexicographically compare values at the wrapped columns with given indices.
pub fn compare(&self, a_idx: usize, b_idx: usize) -> Ordering {
for (data, comparator, sort_option) in &self.compare_items {
match (data.is_valid(a_idx), data.is_valid(b_idx)) {
for (nulls, comparator, sort_option) in &self.compare_items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked at this seems to simply pull the match up a level https://docs.rs/arrow-data/35.0.0/src/arrow_data/data/mod.rs.html#384-389 (as in there is no more dynamic dispatch from what I can see)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should run the benchmarks on this PR prior to merging

Overall though I think this is really nice progress

Some(nulls) => buffer_unary_not(nulls.buffer(), nulls.offset(), nulls.len()),
let values = match input.nulls() {
None => NullBuffer::new_null(input.len()).into_inner(),
Some(nulls) => !nulls.inner(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these certainly do look very nice now 👌

@tustvold
Copy link
Contributor Author

tustvold commented Mar 23, 2023

The benchmarks do not show any major regressions, with most kernels completely unaffected. The boolean kernels seem to show a consistent 10% regression on some systems and a consistent 10% improvement on others, not entirely sure what is going on here but given we're talking 10s of nanoseconds I think it is probably just noise, and certainly not relevant to realistic workloads.

@tustvold tustvold merged commit de9a90b into apache:master Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants