[SPARK-20413] Add new query hint NO_COLLAPSE. #17708

ptkool · 2017-04-20T15:31:26Z

What changes were proposed in this pull request?

This PR proposes adding a new query hint called NO_COLLAPSE that can be used to prevent adjacent projections from being collapsed.

How was this patch tested?

Test using unit tests, integration tests and manual tests.

hvanhovell · 2017-04-20T16:53:28Z

ok to test

SparkQA · 2017-04-20T16:59:23Z

Test build #75995 has finished for PR 17708 at commit 3f1e6a1.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class NoCollapseHint(child: LogicalPlan) extends UnaryNode

hvanhovell

@ptkool thinks for submitting the PR. I am not sure this is the best way to avoid projection collapse. The problem is that this approach will also inhibit other optimization from taking place.

hvanhovell · 2017-04-20T16:54:29Z

python/pyspark/sql/functions.py


+@since(2.2)
+def no_collapse(df):
+    """Marks a DataFrame as small enough for use in broadcast joins."""


Doc is incorrect.

hvanhovell · 2017-04-20T16:59:15Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

-   * @group normal_funcs
-   * @since 1.5.0
-   */
+    * Marks a DataFrame as small enough for use in broadcast joins.


Please undo this change.

hvanhovell · 2017-04-20T17:00:15Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

  }

+  /**
+    * Marks a DataFrame as small enough for use in broadcast joins.


Nit: the alignment is of by a space, it should be:

/** * Text... */

hvanhovell · 2017-04-20T17:00:48Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala


+/**
+ * A hint for the optimizer that we should not merge two projections.
+ */


Can you explain why we want this in the LogicalPlan level and not on the expression level?

The problem with this approach is that most other optimizations won't work with this, for example predicate push down.

I originally thought about putting it at the expression level, but ultimately decided it made more sense at the LogicalPlan node level, since the purpose was in fact to disrupt the optimizer. In some respects, it's meant to have the same effect as df.cache(), but without the caching. There may, in fact, be situations where predicate pushdown is not desired because the resulting condition would become complex and expensive to evaluate.

In Spark SQL, I think it also makes more sense to specify the hint at the derived table level, as opposed to a single expression. For instance,

SELECT SNO, PNO, C1 +1, C1 + 2
FROM ( SELECT /*+ NO_COLLAPSE */ SNO, PNO, QTY * 10 AS C1 FROM T ) T

This is similar to the NO_MERGE query hint in Oracle, which prevents the query from being flattened.

hvanhovell · 2017-04-20T17:18:29Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala

+
+    comparePlans(
+      parsePlan("SELECT a FROM (SELECT /*+ NO_COLLAPSE */ * FROM t) t1"),
+      SubqueryAlias("t1", Hint("NO_COLLAPSE", Seq.empty, table("t").select(star())))


What are you testing here that is not covered by the other cases?

Actually, nothing. I will remove it.

SparkQA · 2017-04-20T18:19:07Z

Test build #76001 has finished for PR 17708 at commit 975cca5.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-20T22:38:23Z

Test build #76005 has finished for PR 17708 at commit 3986247.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-04-23T04:01:56Z

Based on the JIRA description, it sounds like we should not simply merge two Projects to avoid calling the same UDF multiple times, instead of adding a new logical plan node.

viirya · 2017-04-24T02:57:55Z

I have the same question as Reynold asked in the mailing list. Doesn't common sub expression elimination already address this issue?

gatorsmile · 2017-06-14T21:48:15Z

Any update? Maybe we can close this PR at first?

ptkool · 2017-06-26T19:38:15Z

@gatorsmile I will run a few more tests to determine if subexpression elimination solves this issue.

gatorsmile · 2017-06-27T06:45:08Z

We are closing the inactive PRs. After you run more test, please do reopen if you still hit this issue. Thanks!

## What changes were proposed in this pull request? This PR proposes to close stale PRs, mostly the same instances with apache#18017 I believe the author in apache#14807 removed his account. Closes apache#7075 Closes apache#8927 Closes apache#9202 Closes apache#9366 Closes apache#10861 Closes apache#11420 Closes apache#12356 Closes apache#13028 Closes apache#13506 Closes apache#14191 Closes apache#14198 Closes apache#14330 Closes apache#14807 Closes apache#15839 Closes apache#16225 Closes apache#16685 Closes apache#16692 Closes apache#16995 Closes apache#17181 Closes apache#17211 Closes apache#17235 Closes apache#17237 Closes apache#17248 Closes apache#17341 Closes apache#17708 Closes apache#17716 Closes apache#17721 Closes apache#17937 Added: Closes apache#14739 Closes apache#17139 Closes apache#17445 Closes apache#18042 Closes apache#18359 Added: Closes apache#16450 Closes apache#16525 Closes apache#17738 Added: Closes apache#16458 Closes apache#16508 Closes apache#17714 Added: Closes apache#17830 Closes apache#14742 ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#18417 from HyukjinKwon/close-stale-pr.

Add new query hint NO_COLLAPSE.

3f1e6a1

ptkool changed the title ~~Add new query hint NO_COLLAPSE.~~ [SPARK-20413] Add new query hint NO_COLLAPSE. Apr 20, 2017

hvanhovell requested changes Apr 20, 2017

View reviewed changes

Resolve scalastyle errors.

3986247

ptkool force-pushed the no_collapse_query_hint branch from 1231585 to 3986247 Compare April 20, 2017 20:19

HyukjinKwon mentioned this pull request Jun 25, 2017

[INFRA] Close stale PRs #18417

Closed

asfgit closed this in b32bd00 Jun 27, 2017

[SPARK-20413] Add new query hint NO_COLLAPSE. #17708

[SPARK-20413] Add new query hint NO_COLLAPSE. #17708

Uh oh!

Conversation

ptkool commented Apr 20, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

hvanhovell commented Apr 20, 2017

Uh oh!

SparkQA commented Apr 20, 2017

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 20, 2017

Uh oh!

SparkQA commented Apr 20, 2017

Uh oh!

gatorsmile commented Apr 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Apr 24, 2017

Uh oh!

gatorsmile commented Jun 14, 2017

Uh oh!

ptkool commented Jun 26, 2017

Uh oh!

gatorsmile commented Jun 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gatorsmile commented Apr 23, 2017 •

edited

Loading