Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Jun 20, 2014

This avoids building up an expensive hash map if partial aggregation does not result in data size reduction.

Just a prototype. Kinda ugly, doesn't properly connect with the config system yet, and have no test.

@rxin
Copy link
Contributor Author

rxin commented Jun 20, 2014

@concretevitamin I find it hard to actually use config options in a physical operator. Any suggestions?

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@rxin
Copy link
Contributor Author

rxin commented Jun 20, 2014

@pwendell / @mateiz should we actually build this into Spark directly (i.e. in Aggregator)?

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15952/

@concretevitamin
Copy link
Contributor

@rxin If we are simply trying to read the default values for the params, but not user-set ones (i.e. in the absence of a SQLContext in execute()), I think we could move the default param values to a companion object of SQLConf, and in the assessors of this class, either get the user-set values or else get the default values from the static object.

@mateiz
Copy link
Contributor

mateiz commented Jun 21, 2014

It would be great to add this into Aggregator as well. Would that replace the implementation here? I.e. does Spark SQL go through Aggregator?

@rxin
Copy link
Contributor Author

rxin commented Jun 21, 2014

Spark SQL doesn't currently use the aggregator, but we would want to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, those are some high standards!

@rxin
Copy link
Contributor Author

rxin commented Jun 24, 2014

@mateiz I submitted a patch to core's Aggregator in #1191.

After implementing it in Aggregator, I realized it might be hard for Spark SQL to reuse Aggregator unless we change Aggregator to allocate less temporary objects (or write the aggregation code path in Spark SQL to output key value tuples).

@rxin rxin closed this Aug 29, 2014
wangyum added a commit that referenced this pull request May 26, 2023
* CARMEL-6367: Insert bloom filter if it is skew bucket join

* Fix

* fix

* fix

* fix
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
https://github.pie.apple.com/IPR/apache-incubator-iceberg/compare/IPR:9a2d360...IPR:48834b0

Internal: Change Default Optimize Threshold 
Internal (Boson): Bump Boson version to 0.3.23 and remove the fallbac… 
Internal(Boson): Populate spark.boson.exceptionOnDatetimeRebase to Bo… 
Releases Apple Iceberg 1.3.0.5 (apache#1152)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants