[SPARK-16134][SQL] optimizer rules for typed filter #13846

cloud-fan · 2016-06-22T14:21:11Z

What changes were proposed in this pull request?

This PR adds 3 optimizer rules for typed filter:

push typed filter down through SerializeFromObject and eliminate the deserialization in filter condition.
pull typed filter up through SerializeFromObject and eliminate the deserialization in filter condition.
combine adjacent typed filters and share the deserialized object among all the condition expressions.

This PR also adds TypedFilter logical plan, to separate it from normal filter, so that the concept is more clear and it's easier to write optimizer rules.

How was this patch tested?

TypedFilterOptimizationSuite

cloud-fan · 2016-06-22T14:21:30Z

cc @yhuai @liancheng @clockfly

SparkQA · 2016-06-22T15:46:22Z

Test build #61035 has finished for PR 13846 at commit 6200460.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class TypedFilter(

cloud-fan · 2016-06-23T00:28:53Z

retest this please

SparkQA · 2016-06-23T02:29:16Z

Test build #61083 has finished for PR 13846 at commit 6200460.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class TypedFilter(

ueshin · 2016-06-23T12:28:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

I think we can also push down if TypedFilter( Filter( SerializeFromObject(child) ) ) into Filter( SerializeFromObject( TypedFilter(child) ) ).
e.g. ds.map(...).filter(byExpr).filter(byFunc).
What do you think?

Well, it's true, and Filter can be any other unary operators whose output is derived from its child, e.g. Sort.

However, I don't think ds.map(...).filter(byExpr).filter(byFunc) is a common case, i.e. mixing typed and untyped operations interlaced. If there is an easy and general way to optimize it, I'm happy to have it, or I'd like to leave it.

what do you think?

Hmm, I don't think mixing typed and untyped is not a common case, but I don't have any idea to optimize easy and general way so I think we can leave it for now.

liancheng · 2016-06-28T12:54:08Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

Shall we prepend rather than append found filters here? Otherwise filter predicates will be evaluated in reverse order after being combined. Also would be nice to comment about this.

SparkQA · 2016-06-29T12:46:41Z

Test build #61463 has finished for PR 13846 at commit c1e7d9f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-29T13:05:31Z

Test build #61466 has finished for PR 13846 at commit 1a2241f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? This PR adds 3 optimizer rules for typed filter: 1. push typed filter down through `SerializeFromObject` and eliminate the deserialization in filter condition. 2. pull typed filter up through `SerializeFromObject` and eliminate the deserialization in filter condition. 3. combine adjacent typed filters and share the deserialized object among all the condition expressions. This PR also adds `TypedFilter` logical plan, to separate it from normal filter, so that the concept is more clear and it's easier to write optimizer rules. ## How was this patch tested? `TypedFilterOptimizationSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #13846 from cloud-fan/filter. (cherry picked from commit d063898) Signed-off-by: Cheng Lian <lian@databricks.com>

liancheng · 2016-06-30T00:17:24Z

LGTM, merged to master.

(Also merged to branch-2.0 by mistake, will revert it ASAP. Sorry for the trouble.)

liancheng · 2016-06-30T00:19:48Z

Reverted the commit on branch-2.0.

cloud-fan force-pushed the filter branch from c1dac58 to 6200460 Compare June 22, 2016 14:24

ueshin reviewed Jun 23, 2016
View reviewed changes

ueshin mentioned this pull request Jun 24, 2016

[SPARK-15980][SQL] Add PushPredicateThroughObjectConsumer rule to Optimizer. #13702

Closed

liancheng reviewed Jun 28, 2016
View reviewed changes

optimizer rules for typed filter

8adf602

cloud-fan force-pushed the filter branch from 6200460 to c1e7d9f Compare June 29, 2016 10:51

address comments

1a2241f

cloud-fan force-pushed the filter branch from c1e7d9f to 1a2241f Compare June 29, 2016 11:00

asfgit closed this in d063898 Jun 30, 2016

[SPARK-16134][SQL] optimizer rules for typed filter #13846

[SPARK-16134][SQL] optimizer rules for typed filter #13846

Uh oh!

Conversation

cloud-fan commented Jun 22, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Jun 22, 2016

Uh oh!

SparkQA commented Jun 22, 2016

Uh oh!

cloud-fan commented Jun 23, 2016

Uh oh!

SparkQA commented Jun 23, 2016

Uh oh!

ueshin Jun 23, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jun 23, 2016

Choose a reason for hiding this comment

Uh oh!

ueshin Jun 23, 2016

Choose a reason for hiding this comment

Uh oh!

liancheng Jun 28, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 29, 2016

Uh oh!

SparkQA commented Jun 29, 2016

Uh oh!

liancheng commented Jun 30, 2016

Uh oh!

liancheng commented Jun 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants