[CALCITE-3753] Remove rule queue importance #1840

hsyuan · 2020-02-29T00:02:37Z

JIRA: https://issues.apache.org/jira/browse/CALCITE-3753

Also updated test cases.

hsyuan · 2020-03-01T08:54:40Z

All the plan diffs in this patch are with same cost.

hsyuan · 2020-03-01T09:09:23Z

Here is one example:

danny0405 · 2020-03-02T03:16:41Z

Here is one example:

Plan diffs are more than what i expected, BTW, what is the diff tool, it looks pretty good ~

chunweilei · 2020-03-02T03:25:56Z

core/src/test/resources/sql/misc.iq

@@ -473,7 +473,7 @@ EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t5)], expr#9=[IS NULL($t7)
          EnumerableCalc(expr#0..3=[{inputs}], expr#4=[true], deptno=[$t0], $f0=[$t4])
            EnumerableTableScan(table=[[hr, depts]])
    EnumerableAggregate(group=[{0}], agg#0=[MIN($1)])
-      EnumerableCalc(expr#0..3=[{inputs}], expr#4=[90], expr#5=[+($t0, $t4)], expr#6=[true], $f4=[$t5], $f0=[$t6])
+      EnumerableCalc(expr#0..3=[{inputs}], expr#4=[90], expr#5=[+($t0, $t4)], expr#6=[true], $f4=[$t5], $f0=[$t6], $condition=[$t6])
        EnumerableTableScan(table=[[hr, depts]])
 !plan


Is this plan change expected?

Yes, since the total cost is the same.

I think optimizer latency can be benefited a lot from this patch. Any experiment?

Not yet specific experiment. But the slow tests in the patch is 23m. Comparing 32m in master.

hsyuan · 2020-03-02T03:28:20Z

Plan diffs are more than what i expected, BTW, what is the diff tool, it looks pretty good ~

It is diffchecker.

rkondakov · 2020-03-04T09:55:55Z

core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java

-   * <p>If false, the planner continues to fire rules until the rule queue is
-   * empty.
-   */
-  protected boolean impatient = false;


Is there any way now to stop the planner search without exhaustive search space exploration? impatient flag was pretty useful for it.

Nope. impatient doesn't guarantee to find the best plan, no one ever used it.

You are right, 'impatient' flag doesn't help to find the best plan. But it helps to interrupt the search if it takes too much time and you are ok to go ahead with a sub optimal plan. We use it for this purpose.

I think you should investigate why your planner takes too much time to generate a plan.

We've investigated it and looks like it is very similar to the abstract converters problem. impatient flag was quite helpful for us in some cases.

As a user, you can extend VolcanoPlanner and override the checkCancel method.

Ok, it might help us.
Off-topic question: why do we need several VolcanoPlannerPhase when only the one of them is used? May be we can throw it away along with the queue importance?

In default implementation, there is only 1 phase used. But actually it might help to have multiple phases to split different kinds of rules.

We should still keep planning phase. In classic volcano model, there are transformation, implementation and optimization phases. The planner can do specific things for different phases. It's not used now, but we could probably extend it in the future.

I agree with @rkondakov, we should have an alternative, or I would see it as a regression, we never have a perfect planner. Instead of running into timeout, a motive control flag is more acceptable and user friendly.

chunweilei

LGTM

vlsi · 2020-03-05T10:44:24Z

core/src/test/java/org/apache/calcite/test/LatticeTest.java

+            + "  EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n"
+            + "    EnumerableAggregate(group=[{0}])\n"


This looks like a plan degradation, doesn't it?

It does have the original plan's alternative, but from cost model's perspective, the new one is a cheaper plan.

Who will fix the costing model then?
I think it is unfair to merge a change that is not really compatible with the costing model.

If the change to optimizer requires adjustments to the costing model, then could you please do that in a single PR, so we see the net changes for both plans and the response times?

I think it is orthogonal. should be done is a separate PR. The issue of cost model exists before this change.

vlsi · 2020-03-05T10:54:06Z

core/src/test/resources/sql/sub-query.iq

-          EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3])
+          EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8])


This looks like a plan degradation.
For instance expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)] is the same as null:BOOLEAN.

Do you know the reason for this plan degradation?

The cost model doesn't think it is a degradation:

Let me put it in another way: you change the optimizer, and now it favours bad plans.
What the optimizer now does it introduces a dummy always_true filter, and it thinks the filter would reduce the number of rows and so on. It does not look like a well-behaving optimizer :-/

Even though the change reduces slow test execution, that reduction might be the result of "skipping some rules" rather than removing importance.

So currently it looks like some rules do not fire which result in a noticeable amount of useless predicates floating around.

How do you know it is skipping some rules? Any evidence?

As a human being, you know it is a bad plan, but the cost model thinks it is a better plan. Shouldn't you blame the cost model?

OK. You propose a change. It results in generating bad plans, thus it introduces a technical regression. There's a technical justification for -1.

I know cost model has awful lot of inconsistencies. However, it turns out that all those 100 tiny inconsistencies cancel each other, and Calcite manages to produce "sane" plans.
Now you fix one or two such defects (which has good merit), however, the net result becomes that there are 98 inconsistencies in the cost model which no longer cancel each other.

Calcite purpose is optimizer, and it is really sad to introduce regressions to the optimizer.

@vlsi I fixed the plan diffs as requested.

zabetak

One quick comment while scanning through the PR. I think that the removal of ambitious and impatient flags as well as the change in FilterProjectTransposeRule are breaking changes and should be included in the release note history.md along with the instructions on alternatives (if there are).

hsyuan · 2020-03-07T05:54:56Z

@zabetak Thanks for reminding. Will update it.

danny0405 · 2020-03-07T00:32:25Z

core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcRules.java

@@ -801,6 +801,11 @@ public JdbcSort(
          offset, fetch);
    }

+    @Override public RelOptCost computeSelfCost(RelOptPlanner planner,
+        RelMetadataQuery mq) {
+      return super.computeSelfCost(planner, mq).multiplyBy(0.9);


Why we need a 0.9 factor?

To make it cheaper than default sort. Same applies on GeodeSort.

It is weird to tweak the cost to select a specific Convention and why should the JDBC convention should be cheaper ?

If you don't think the following plans should have different cost, I am happy to change it back:

Okey, thanks for the explanation.

danny0405 · 2020-03-07T00:36:17Z

core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java

-   * <p>If false, the planner continues to fire rules until the rule queue is
-   * empty.
-   */
-  protected boolean impatient = false;


I agree with @rkondakov, we should have an alternative, or I would see it as a regression, we never have a perfect planner. Instead of running into timeout, a motive control flag is more acceptable and user friendly.

danny0405 · 2020-03-07T00:38:36Z

core/src/main/java/org/apache/calcite/rel/rules/AggregateRemoveRule.java

@@ -123,6 +123,7 @@ public void onMatch(RelOptRuleCall call) {
      // aggregate functions, add a project for the same effect.
      relBuilder.project(relBuilder.fields(aggregate.getGroupSet()));
    }
+    call.getPlanner().setImportance(aggregate, 0d);
    call.transformTo(relBuilder.build());


Why we need this change?

danny0405 · 2020-03-07T00:39:36Z

core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java

   * <p>It does not allow a Filter to be pushed past the Project if
   * {@link RexUtil#containsCorrelation there is a correlation condition})
   * anywhere in the Filter, since in some cases it can prevent a
   * {@link org.apache.calcite.rel.core.Correlate} from being de-correlated.
   */
  public static final FilterProjectTransposeRule INSTANCE =
-      new FilterProjectTransposeRule(Filter.class, Project.class, true, true,
+      new FilterProjectTransposeRule(LogicalFilter.class, LogicalProject.class, true, true,


A breaking change

danny0405 · 2020-03-07T00:41:24Z

core/src/test/resources/sql/winagg.iq

@@ -431,14 +431,14 @@ join (
  from "hr"."emps"
  window w as (partition by "deptno" order by "commission")) b
 on a."deptno" = b."deptno"
-limit 5;
+order by "deptno", ar, br limit 5;


Why another sort?

To stablize test.

Calcite executes the query with single thread, so theoretically, there shouldn't be any un-stability.

Without ordering, different join order will generate different data.

But with same query and same rule sets, the rules should be fired in same sequence right ?

If someone tuned the cost model and caused a plan change, the result will be different again.

We can not force every limit test to have an order by, can we have a more nice solution to promotion the stability, i.e. if a make this change into Flink, there would be some test fails but i would not expect to modify the queries.

You can modify the result, instead of modifying the query.

danny0405 · 2020-03-07T00:41:52Z

geode/src/main/java/org/apache/calcite/adapter/geode/rel/GeodeSort.java

@@ -58,7 +58,7 @@
    if (fetch != null) {
      return cost.multiplyBy(0.05);
    } else {
-      return cost;
+      return cost.multiplyBy(0.9);


hsyuan · 2020-03-08T23:37:49Z

Thanks everyone for reviewing. I will merge this PR in 24 hours.

…itutional rule first

hsyuan added the slow-tests-needed label Mar 1, 2020

chunweilei reviewed Mar 2, 2020

View reviewed changes

danny0405 added the discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR label Mar 4, 2020

rkondakov reviewed Mar 4, 2020

View reviewed changes

danny0405 force-pushed the master branch from a549342 to 9b7b631 Compare March 5, 2020 04:59

hsyuan added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Mar 5, 2020

chunweilei reviewed Mar 5, 2020

View reviewed changes

vlsi reviewed Mar 5, 2020

View reviewed changes

zabetak reviewed Mar 6, 2020

View reviewed changes

danny0405 reviewed Mar 7, 2020

View reviewed changes

hsyuan added 8 commits March 9, 2020 16:45

[CALCITE-3753] Remove rule queue importance

2489d07

Update test cases

ce50363

Update test

6c506a6

Use Queue interface instead of List

0874c20

Add timeout option

26aa0ea

Address comments

d1822b5

Update test

1a4b9b3

[CALCITE-3753] Introduce SubstitutionRule interface and execute subst…

85a238c

…itutional rule first

hsyuan closed this in 80e6b02 Mar 9, 2020

hsyuan deleted the remove_importance branch March 9, 2020 23:25

		+ " EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n"
		+ " EnumerableAggregate(group=[{0}])\n"

		EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3])
		EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8])

[CALCITE-3753] Remove rule queue importance #1840

[CALCITE-3753] Remove rule queue importance #1840

Conversation

hsyuan commented Feb 29, 2020 • edited Loading

hsyuan commented Mar 1, 2020 • edited Loading

hsyuan commented Mar 1, 2020

danny0405 commented Mar 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hsyuan commented Mar 2, 2020

rkondakov Mar 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danny0405 Mar 7, 2020 • edited Loading

Choose a reason for hiding this comment

chunweilei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vlsi Mar 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vlsi Mar 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zabetak left a comment

Choose a reason for hiding this comment

hsyuan commented Mar 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danny0405 Mar 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hsyuan commented Mar 8, 2020

hsyuan commented Feb 29, 2020 •

edited

Loading

hsyuan commented Mar 1, 2020 •

edited

Loading

rkondakov Mar 4, 2020 •

edited

Loading

danny0405 Mar 7, 2020 •

edited

Loading

vlsi Mar 5, 2020 •

edited

Loading

vlsi Mar 5, 2020 •

edited

Loading

danny0405 Mar 7, 2020 •

edited

Loading