Skip to content

Conversation

@yhuai
Copy link
Contributor

@yhuai yhuai commented Aug 5, 2015

This is the followup of #7813. It renames HybridUnsafeAggregationIterator to TungstenAggregationIterator and makes it only work with UnsafeRow. Also, I add a TungstenAggregate that uses TungstenAggregationIterator and make SortBasedAggregate (renamed from SortBasedAggregate) only works with SafeRow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen I made this change to workaround ChainedBufferOutputStream's unsupported write(b: Int).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to double check if we need to wrap input stream with a buffered input stream when we read data back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @JoshRosen do you think this is fine? seems inefficient to me but maybe there is no better way

@yhuai
Copy link
Contributor Author

yhuai commented Aug 5, 2015

I will add proper tests for our fallback strategy.

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39839 has finished for PR 7954 at commit a13f6af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #1354 has finished for PR 7954 at commit aefbafa.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39840 has finished for PR 7954 at commit aefbafa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Aug 5, 2015

test this please

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39878 has finished for PR 7954 at commit aefbafa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39896 has finished for PR 7954 at commit 7227d69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39912 has finished for PR 7954 at commit 394682e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai yhuai changed the title [SPARK-9630] [SQL] [WIP] Clean up new aggregate operators (SPARK-9240 follow up) [SPARK-9630] [SQL] Clean up new aggregate operators (SPARK-9240 follow up) Aug 6, 2015
@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #39978 has finished for PR 7954 at commit ec7dc1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #39979 has finished for PR 7954 at commit ac69a1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which old path are we talking about? the "old" aggregate code path is not using sum here, is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We replace old aggregate functions to new aggregate functions at planning time. So, we need to have NullType at here to make this expression resolved.

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #39991 has finished for PR 7954 at commit 34fa17b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createNewBuffer -> createNewAggregationBuffer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rxin
Copy link
Contributor

rxin commented Aug 6, 2015

I tested this on some local dataset. It is not a very scientific one, but I think it is actually slower than the existing aggregate on master ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will create too much memory copying -- and might explain the slow down. I was thinking about only doing the unsafe row joining if we are directly outputting them into an exchange (i.e. partial aggregation).

@yhuai
Copy link
Contributor Author

yhuai commented Aug 6, 2015

test this please

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #40052 has finished for PR 7954 at commit 4d2f4fc.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #1384 has finished for PR 7954 at commit 4d2f4fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Aug 6, 2015

Merging this in.

@asfgit asfgit closed this in 3504bf3 Aug 6, 2015
asfgit pushed a commit that referenced this pull request Aug 6, 2015
…w up)

This is the followup of #7813. It renames `HybridUnsafeAggregationIterator` to `TungstenAggregationIterator` and makes it only work with `UnsafeRow`. Also, I add a `TungstenAggregate` that uses `TungstenAggregationIterator` and make `SortBasedAggregate` (renamed from `SortBasedAggregate`) only works with `SafeRow`.

Author: Yin Huai <yhuai@databricks.com>

Closes #7954 from yhuai/agg-followUp and squashes the following commits:

4d2f4fc [Yin Huai] Add comments and free map.
0d7ddb9 [Yin Huai] Add TungstenAggregationQueryWithControlledFallbackSuite to test fall back process.
91d69c2 [Yin Huai] Rename UnsafeHybridAggregationIterator to  TungstenAggregateIteraotr and make it only work with UnsafeRow.

(cherry picked from commit 3504bf3)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants