Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Oct 8, 2014

BroadcastHashJoin builds a new hash table for each partition. We can build it once per node and reuse the hash table.

@rxin
Copy link
Contributor Author

rxin commented Oct 8, 2014

This is based on #2719. We should merge that one first.

…zation.

If we write the filter which is always FALSE like

    SELECT * from person WHERE FALSE;

200 tasks will run. I think, 1 task is enough.

And current optimizer cannot optimize the case NOT is duplicated like

    SELECT * from person WHERE NOT ( NOT (age > 30));

The filter rule above should be simplified

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes apache#2692 from sarutak/SPARK-3831 and squashes the following commits:

25f3e20 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3831
23c750c [Kousuke Saruta] Improved unsupported predicate test case
a11b9f3 [Kousuke Saruta] Modified NOT predicate test case in PartitionBatchPruningSuite
8ea872b [Kousuke Saruta] Fixed the number of tasks when the data of  LocalRelation is empty.
@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have started for PR 2722 at commit 90b58c0.

  • This patch merges cleanly.

liancheng and others added 3 commits October 8, 2014 17:04
This PR uses JSON instead of `toString` to serialize `DataType`s. The latter is not only hard to parse but also flaky in many cases.

Since we already write schema information to Parquet metadata in the old style, we have to reserve the old `DataType` parser and ensure downward compatibility. The old parser is now renamed to `CaseClassStringParser` and moved into `object DataType`.

JoshRosen davies Please help review PySpark related changes, thanks!

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#2563 from liancheng/datatype-to-json and squashes the following commits:

fc92eb3 [Cheng Lian] Reverts debugging code, simplifies primitive type JSON representation
438c75f [Cheng Lian] Refactors PySpark DataType JSON SerDe per comments
6b6387b [Cheng Lian] Removes debugging code
6a3ee3a [Cheng Lian] Addresses per review comments
dc158b5 [Cheng Lian] Addresses PEP8 issues
99ab4ee [Cheng Lian] Adds compatibility est case for Parquet type conversion
a983a6c [Cheng Lian] Adds PySpark support
f608c6e [Cheng Lian] De/serializes DataType objects from/to JSON
marmbrus
Update README.md to be consistent with Spark 1.1

Author: Liquan Pei <liquanpei@gmail.com>

Closes apache#2706 from Ishiihara/SparkSQL-readme and squashes the following commits:

33b9d4b [Liquan Pei] keep README.md up to date
@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have finished for PR 2722 at commit 90b58c0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21495/Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21497/Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have started for PR 2722 at commit 4b9d0c9.

  • This patch merges cleanly.

chenghao-intel and others added 4 commits October 8, 2014 17:52
Calling `BinaryArithmetic.dataType` will throws exception until it's resolved, but in type coercion rule `Division`, seems doesn't follow this.

Author: Cheng Hao <hao.cheng@intel.com>

Closes apache#2559 from chenghao-intel/type_coercion and squashes the following commits:

199a85d [Cheng Hao] Simplify the divide rule
dc55218 [Cheng Hao] fix bug of type coercion in div
Includes partition keys into account when applying `PreInsertionCasts` rule.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#2672 from liancheng/fix-pre-insert-casts and squashes the following commits:

def1a1a [Cheng Lian] Makes PreInsertionCasts handle partitions properly
…inserting Hive values

Builds all wrappers at first according to object inspector types to avoid per row costs.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#2592 from liancheng/hive-value-wrapper and squashes the following commits:

9696559 [Cheng Lian] Passes all tests
4998666 [Cheng Lian] Prevents per row dynamic dispatching and pattern matching when inserting Hive values
Author: Reynold Xin <rxin@apache.org>

Closes apache#2719 from rxin/sql-join-break and squashes the following commits:

0c0082b [Reynold Xin] Fix line length.
cbc664c [Reynold Xin] Rename join -> joins package.
a070d44 [Reynold Xin] Fix line length in HashJoin
a39be8c [Reynold Xin] [SPARK-3857] Create a join package for various join operators.
@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have finished for PR 2722 at commit 4b9d0c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21500/Test PASSed.

…sh-1

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala
@rxin
Copy link
Contributor Author

rxin commented Oct 9, 2014

replacing this with #2727

@rxin rxin closed this Oct 9, 2014
@rxin rxin deleted the SPARK-3861-broadcast-hash branch October 9, 2014 04:39
@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have started for PR 2722 at commit 18eb214.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have finished for PR 2722 at commit 18eb214.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class UniqueKeyHashedRelation(hashTable: JavaHashMap[Row, Row])

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21519/Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants