Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds 3 optimizer rules for typed filter:

  1. push typed filter down through SerializeFromObject and eliminate the deserialization in filter condition.
  2. pull typed filter up through SerializeFromObject and eliminate the deserialization in filter condition.
  3. combine adjacent typed filters and share the deserialized object among all the condition expressions.

This PR also adds TypedFilter logical plan, to separate it from normal filter, so that the concept is more clear and it's easier to write optimizer rules.

How was this patch tested?

TypedFilterOptimizationSuite

@cloud-fan
Copy link
Contributor Author

cc @yhuai @liancheng @clockfly

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61035 has finished for PR 13846 at commit 6200460.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TypedFilter(

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 23, 2016

Test build #61083 has finished for PR 13846 at commit 6200460.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TypedFilter(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also push down if TypedFilter( Filter( SerializeFromObject(child) ) ) into Filter( SerializeFromObject( TypedFilter(child) ) ).
e.g. ds.map(...).filter(byExpr).filter(byFunc).
What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's true, and Filter can be any other unary operators whose output is derived from its child, e.g. Sort.

However, I don't think ds.map(...).filter(byExpr).filter(byFunc) is a common case, i.e. mixing typed and untyped operations interlaced. If there is an easy and general way to optimize it, I'm happy to have it, or I'd like to leave it.

what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think mixing typed and untyped is not a common case, but I don't have any idea to optimize easy and general way so I think we can leave it for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we prepend rather than append found filters here? Otherwise filter predicates will be evaluated in reverse order after being combined. Also would be nice to comment about this.

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61463 has finished for PR 13846 at commit c1e7d9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61466 has finished for PR 13846 at commit 1a2241f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jun 30, 2016
## What changes were proposed in this pull request?

This PR adds 3 optimizer rules for typed filter:

1. push typed filter down through `SerializeFromObject` and eliminate the deserialization in filter condition.
2. pull typed filter up through `SerializeFromObject` and eliminate the deserialization in filter condition.
3. combine adjacent typed filters and share the deserialized object among all the condition expressions.

This PR also adds `TypedFilter` logical plan, to separate it from normal filter, so that the concept is more clear and it's easier to write optimizer rules.

## How was this patch tested?

`TypedFilterOptimizationSuite`

Author: Wenchen Fan <wenchen@databricks.com>

Closes #13846 from cloud-fan/filter.

(cherry picked from commit d063898)
Signed-off-by: Cheng Lian <lian@databricks.com>
@asfgit asfgit closed this in d063898 Jun 30, 2016
@liancheng
Copy link
Contributor

LGTM, merged to master.

(Also merged to branch-2.0 by mistake, will revert it ASAP. Sorry for the trouble.)

@liancheng
Copy link
Contributor

Reverted the commit on branch-2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants