Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify GroupByHash implementation (to prepare for more work) #4972

Merged
merged 3 commits into from
Jan 21, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 18, 2023

Draft as it builds on #4924

Which issue does this PR close?

re #4973

Rationale for this change

Follow on to #4924 work from @mustafasrepo and @ozankabak

As we prepare to improve group by performance even more, we will be working on this code going forward.

There are several TODOs in the group by hash code as well as some out of date comments that make it harder to work with. Given the thinking / plans to improve this code it is important it remains relatively easy to work with

Since I had all the code paged in anyways as I was reviewing #4924 I figured I would add my comments here

What changes are included in this PR?

  1. Remove extra level of unwrapping in GroupedHashAggregateStreamInner
  2. Make group_aggregate_batch and create_batch_from_map member functions rather than free functions (and remove clippy warnings)

Are these changes tested?

Existing tests cover these cases (this is a refactor)

Are there any user-facing changes?

No

Benchmark results

git checkout 96cf046be57bf09548d51f50d0bc964904bcec7d
cargo bench -p datafusion --bench aggregate_query_sql -- --save-baseline pr4972-pre
git checkout alamb/simplify_group_by
cargo bench -p datafusion --bench aggregate_query_sql -- --baseline pr4972-pre

I think the benchmarks show no significant changes (other than noise)

Click me
aggregate_query_no_group_by 15 12
                        time:   [2.3153 ms 2.3279 ms 2.3414 ms]
                        change: [-2.4095% -1.5536% -0.7448%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

aggregate_query_no_group_by_min_max_f64
                        time:   [2.1700 ms 2.1826 ms 2.1958 ms]
                        change: [-1.7443% -0.9155% -0.1242%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

aggregate_query_no_group_by_count_distinct_wide
                        time:   [5.8379 ms 5.9051 ms 5.9713 ms]
                        change: [-1.2545% +0.2912% +1.9169%] (p = 0.72 > 0.05)
                        No change in performance detected.

aggregate_query_no_group_by_count_distinct_narrow
                        time:   [3.6279 ms 3.6631 ms 3.6990 ms]
                        change: [-2.6099% -1.2926% +0.0237%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

aggregate_query_group_by
                        time:   [5.4279 ms 5.4945 ms 5.5616 ms]
                        change: [-0.8897% +0.6369% +2.2354%] (p = 0.43 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

aggregate_query_group_by_with_filter
                        time:   [3.5274 ms 3.5516 ms 3.5761 ms]
                        change: [-4.4837% -3.6534% -2.8178%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

aggregate_query_group_by_u64 15 12
                        time:   [5.1773 ms 5.2419 ms 5.3089 ms]
                        change: [-4.5574% -2.8527% -1.1438%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

aggregate_query_group_by_with_filter_u64 15 12
                        time:   [3.5820 ms 3.6025 ms 3.6236 ms]
                        change: [+2.4312% +3.2799% +4.1676%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

aggregate_query_group_by_u64_multiple_keys
                        time:   [35.172 ms 36.120 ms 37.089 ms]
                        change: [-0.2381% +3.6619% +7.7225%] (p = 0.06 > 0.05)
                        No change in performance detected.

aggregate_query_approx_percentile_cont_on_u64
                        time:   [10.832 ms 10.992 ms 11.152 ms]
                        change: [-2.4045% -0.3099% +1.7212%] (p = 0.77 > 0.05)
                        No change in performance detected.

aggregate_query_approx_percentile_cont_on_f32
                        time:   [9.7958 ms 9.9346 ms 10.076 ms]
                        change: [-3.4670% -1.4739% +0.5056%] (p = 0.15 > 0.05)
                        No change in performance detected.

@github-actions github-actions bot added core Core DataFusion crate physical-expr Physical Expressions labels Jan 18, 2023
@alamb alamb changed the title Alamb/simplify group by Simplify GroupByHash implementation (to prepare for more work) Jan 18, 2023
@github-actions github-actions bot removed the physical-expr Physical Expressions label Jan 19, 2023
Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this PR with whitespace blind diff I think makes it easier to see what changed: https://github.com/apache/arrow-datafusion/pull/4972/files?w=1


/// Actual implementation of [`GroupedHashAggregateStream`].
///
/// This is wrapped into yet another struct because we need to interact with the async memory management subsystem
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment about another struct for memory management is out of date and so I folded GroupedHashAggregateStreamInner directly into GroupedHashAggregateStream

@@ -115,6 +106,14 @@ struct GroupedHashAggregateStreamInner {
indices: [Vec<Range<usize>>; 2],
}

#[derive(Debug)]
/// tracks what phase the aggregation is in
enum ExecutionState {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be tracked using several multi-level match statement and a fused inner stream. Now it is represented explicitly in this stream

///
/// If successfull, this returns the additional number of bytes that were allocated during this process.
///
/// TODO: Make this a member function of [`GroupedHashAggregateStream`]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE!

@@ -576,138 +568,131 @@ impl std::fmt::Debug for RowAggregationState {
}
}

/// Create a RecordBatch with all group keys and accumulator' states or values.
#[allow(clippy::too_many_arguments)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here, moved from a free function to a member function on GroupedHashAggregateStream


// seems like some consumers call this stream even after it returned `None`, so let's fuse the stream.
let stream = stream.fuse();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to fuse the stream implicitly -- but it is now handled via ExecutionState::Done

@alamb alamb marked this pull request as ready for review January 19, 2023 12:55
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only one minor inline comment. This part of the code is really getting tidied up 🙂

Comment on lines 250 to 260
match result.and_then(|allocated| {
this.row_aggr_state.reservation.try_grow(allocated)
self.row_aggr_state.reservation.try_grow(allocated)
}) {
Ok(_) => continue,
Err(e) => Err(ArrowError::ExternalError(Box::new(e))),
Ok(_) => {}
Err(e) => {
return Poll::Ready(Some(Err(
ArrowError::ExternalError(Box::new(e)),
)))
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Ok case is a no-op, an if let Err(e) = ... seems to be more idiomatic here

Copy link
Contributor Author

@alamb alamb Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- changed in 06847b5. Thank you for the suggestion

@alamb
Copy link
Contributor Author

alamb commented Jan 20, 2023

I plan to merge this tomorrow unless anyone would like more time to review or comment

cc @tustvold @Dandandan @crepererum

@alamb alamb merged commit 350cb47 into apache:master Jan 21, 2023
@alamb alamb deleted the alamb/simplify_group_by branch January 21, 2023 10:48
@ursabot
Copy link

ursabot commented Jan 21, 2023

Benchmark runs are scheduled for baseline = f5439c8 and contender = 350cb47. 350cb47 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants