-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9630] [SQL] Clean up new aggregate operators (SPARK-9240 follow up) #7954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshRosen I made this change to workaround ChainedBufferOutputStream's unsupported write(b: Int).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we need to double check if we need to wrap input stream with a buffered input stream when we read data back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @JoshRosen do you think this is fine? seems inefficient to me but maybe there is no better way
|
I will add proper tests for our fallback strategy. |
|
Test build #39839 has finished for PR 7954 at commit
|
|
Test build #1354 has finished for PR 7954 at commit
|
|
Test build #39840 has finished for PR 7954 at commit
|
|
test this please |
|
Test build #39878 has finished for PR 7954 at commit
|
|
Test build #39896 has finished for PR 7954 at commit
|
|
Test build #39912 has finished for PR 7954 at commit
|
|
Test build #39978 has finished for PR 7954 at commit
|
|
Test build #39979 has finished for PR 7954 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which old path are we talking about? the "old" aggregate code path is not using sum here, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We replace old aggregate functions to new aggregate functions at planning time. So, we need to have NullType at here to make this expression resolved.
|
Test build #39991 has finished for PR 7954 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createNewBuffer -> createNewAggregationBuffer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
I tested this on some local dataset. It is not a very scientific one, but I think it is actually slower than the existing aggregate on master ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will create too much memory copying -- and might explain the slow down. I was thinking about only doing the unsafe row joining if we are directly outputting them into an exchange (i.e. partial aggregation).
…nd make it only work with UnsafeRow.
|
test this please |
|
Test build #40052 has finished for PR 7954 at commit
|
|
Test build #1384 has finished for PR 7954 at commit
|
|
Merging this in. |
…w up) This is the followup of #7813. It renames `HybridUnsafeAggregationIterator` to `TungstenAggregationIterator` and makes it only work with `UnsafeRow`. Also, I add a `TungstenAggregate` that uses `TungstenAggregationIterator` and make `SortBasedAggregate` (renamed from `SortBasedAggregate`) only works with `SafeRow`. Author: Yin Huai <yhuai@databricks.com> Closes #7954 from yhuai/agg-followUp and squashes the following commits: 4d2f4fc [Yin Huai] Add comments and free map. 0d7ddb9 [Yin Huai] Add TungstenAggregationQueryWithControlledFallbackSuite to test fall back process. 91d69c2 [Yin Huai] Rename UnsafeHybridAggregationIterator to TungstenAggregateIteraotr and make it only work with UnsafeRow. (cherry picked from commit 3504bf3) Signed-off-by: Reynold Xin <rxin@databricks.com>
This is the followup of #7813. It renames
HybridUnsafeAggregationIteratortoTungstenAggregationIteratorand makes it only work withUnsafeRow. Also, I add aTungstenAggregatethat usesTungstenAggregationIteratorand makeSortBasedAggregate(renamed fromSortBasedAggregate) only works withSafeRow.