Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-3753] Remove rule queue importance #1840

Closed
wants to merge 8 commits into from
Closed

[CALCITE-3753] Remove rule queue importance #1840

wants to merge 8 commits into from

Conversation

hsyuan
Copy link
Member

@hsyuan hsyuan commented Feb 29, 2020

@hsyuan
Copy link
Member Author

hsyuan commented Mar 1, 2020

All the plan diffs in this patch are with same cost.

@hsyuan
Copy link
Member Author

hsyuan commented Mar 1, 2020

Here is one example:
image

@danny0405
Copy link
Contributor

Here is one example:
image

Plan diffs are more than what i expected, BTW, what is the diff tool, it looks pretty good ~

@@ -473,7 +473,7 @@ EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t5)], expr#9=[IS NULL($t7)
EnumerableCalc(expr#0..3=[{inputs}], expr#4=[true], deptno=[$t0], $f0=[$t4])
EnumerableTableScan(table=[[hr, depts]])
EnumerableAggregate(group=[{0}], agg#0=[MIN($1)])
EnumerableCalc(expr#0..3=[{inputs}], expr#4=[90], expr#5=[+($t0, $t4)], expr#6=[true], $f4=[$t5], $f0=[$t6])
EnumerableCalc(expr#0..3=[{inputs}], expr#4=[90], expr#5=[+($t0, $t4)], expr#6=[true], $f4=[$t5], $f0=[$t6], $condition=[$t6])
EnumerableTableScan(table=[[hr, depts]])
!plan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this plan change expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since the total cost is the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think optimizer latency can be benefited a lot from this patch. Any experiment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet specific experiment. But the slow tests in the patch is 23m. Comparing 32m in master.

@hsyuan
Copy link
Member Author

hsyuan commented Mar 2, 2020

Plan diffs are more than what i expected, BTW, what is the diff tool, it looks pretty good ~

It is diffchecker.

@danny0405 danny0405 added the discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR label Mar 4, 2020
* <p>If false, the planner continues to fire rules until the rule queue is
* empty.
*/
protected boolean impatient = false;
Copy link
Contributor

@rkondakov rkondakov Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way now to stop the planner search without exhaustive search space exploration? impatient flag was pretty useful for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. impatient doesn't guarantee to find the best plan, no one ever used it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, 'impatient' flag doesn't help to find the best plan. But it helps to interrupt the search if it takes too much time and you are ok to go ahead with a sub optimal plan. We use it for this purpose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should investigate why your planner takes too much time to generate a plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've investigated it and looks like it is very similar to the abstract converters problem. impatient flag was quite helpful for us in some cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, you can extend VolcanoPlanner and override the checkCancel method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it might help us.
Off-topic question: why do we need several VolcanoPlannerPhase when only the one of them is used? May be we can throw it away along with the queue importance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In default implementation, there is only 1 phase used. But actually it might help to have multiple phases to split different kinds of rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still keep planning phase. In classic volcano model, there are transformation, implementation and optimization phases. The planner can do specific things for different phases. It's not used now, but we could probably extend it in the future.

Copy link
Contributor

@danny0405 danny0405 Mar 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rkondakov, we should have an alternative, or I would see it as a regression, we never have a perfect planner. Instead of running into timeout, a motive control flag is more acceptable and user friendly.

@hsyuan hsyuan added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Mar 5, 2020
Copy link
Contributor

@chunweilei chunweilei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 681 to 682
+ " EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n"
+ " EnumerableAggregate(group=[{0}])\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a plan degradation, doesn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does have the original plan's alternative, but from cost model's perspective, the new one is a cheaper plan.

Copy link
Contributor

@vlsi vlsi Mar 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who will fix the costing model then?
I think it is unfair to merge a change that is not really compatible with the costing model.

If the change to optimizer requires adjustments to the costing model, then could you please do that in a single PR, so we see the net changes for both plans and the response times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is orthogonal. should be done is a separate PR. The issue of cost model exists before this change.

EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3])
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a plan degradation.
For instance expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)] is the same as null:BOOLEAN.

Do you know the reason for this plan degradation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost model doesn't think it is a degradation:
image

Copy link
Contributor

@vlsi vlsi Mar 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me put it in another way: you change the optimizer, and now it favours bad plans.
What the optimizer now does it introduces a dummy always_true filter, and it thinks the filter would reduce the number of rows and so on. It does not look like a well-behaving optimizer :-/

Even though the change reduces slow test execution, that reduction might be the result of "skipping some rules" rather than removing importance.

So currently it looks like some rules do not fire which result in a noticeable amount of useless predicates floating around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know it is skipping some rules? Any evidence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a human being, you know it is a bad plan, but the cost model thinks it is a better plan. Shouldn't you blame the cost model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. You propose a change. It results in generating bad plans, thus it introduces a technical regression. There's a technical justification for -1.

I know cost model has awful lot of inconsistencies. However, it turns out that all those 100 tiny inconsistencies cancel each other, and Calcite manages to produce "sane" plans.
Now you fix one or two such defects (which has good merit), however, the net result becomes that there are 98 inconsistencies in the cost model which no longer cancel each other.

Calcite purpose is optimizer, and it is really sad to introduce regressions to the optimizer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlsi I fixed the plan diffs as requested.

Copy link
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick comment while scanning through the PR. I think that the removal of ambitious and impatient flags as well as the change in FilterProjectTransposeRule are breaking changes and should be included in the release note history.md along with the instructions on alternatives (if there are).

@hsyuan
Copy link
Member Author

hsyuan commented Mar 7, 2020

@zabetak Thanks for reminding. Will update it.

@@ -801,6 +801,11 @@ public JdbcSort(
offset, fetch);
}

@Override public RelOptCost computeSelfCost(RelOptPlanner planner,
RelMetadataQuery mq) {
return super.computeSelfCost(planner, mq).multiplyBy(0.9);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need a 0.9 factor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it cheaper than default sort. Same applies on GeodeSort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird to tweak the cost to select a specific Convention and why should the JDBC convention should be cheaper ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't think the following plans should have different cost, I am happy to change it back:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey, thanks for the explanation.

* <p>If false, the planner continues to fire rules until the rule queue is
* empty.
*/
protected boolean impatient = false;
Copy link
Contributor

@danny0405 danny0405 Mar 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rkondakov, we should have an alternative, or I would see it as a regression, we never have a perfect planner. Instead of running into timeout, a motive control flag is more acceptable and user friendly.

@@ -123,6 +123,7 @@ public void onMatch(RelOptRuleCall call) {
// aggregate functions, add a project for the same effect.
relBuilder.project(relBuilder.fields(aggregate.getGroupSet()));
}
call.getPlanner().setImportance(aggregate, 0d);
call.transformTo(relBuilder.build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this change?

* <p>It does not allow a Filter to be pushed past the Project if
* {@link RexUtil#containsCorrelation there is a correlation condition})
* anywhere in the Filter, since in some cases it can prevent a
* {@link org.apache.calcite.rel.core.Correlate} from being de-correlated.
*/
public static final FilterProjectTransposeRule INSTANCE =
new FilterProjectTransposeRule(Filter.class, Project.class, true, true,
new FilterProjectTransposeRule(LogicalFilter.class, LogicalProject.class, true, true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A breaking change

@@ -431,14 +431,14 @@ join (
from "hr"."emps"
window w as (partition by "deptno" order by "commission")) b
on a."deptno" = b."deptno"
limit 5;
order by "deptno", ar, br limit 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why another sort?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To stablize test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calcite executes the query with single thread, so theoretically, there shouldn't be any un-stability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without ordering, different join order will generate different data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But with same query and same rule sets, the rules should be fired in same sequence right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone tuned the cost model and caused a plan change, the result will be different again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not force every limit test to have an order by, can we have a more nice solution to promotion the stability, i.e. if a make this change into Flink, there would be some test fails but i would not expect to modify the queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can modify the result, instead of modifying the query.

@@ -58,7 +58,7 @@
if (fetch != null) {
return cost.multiplyBy(0.05);
} else {
return cost;
return cost.multiplyBy(0.9);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@hsyuan
Copy link
Member Author

hsyuan commented Mar 8, 2020

Thanks everyone for reviewing. I will merge this PR in 24 hours.

@hsyuan hsyuan closed this in 80e6b02 Mar 9, 2020
@hsyuan hsyuan deleted the remove_importance branch March 9, 2020 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR LGTM-will-merge-soon Overall PR looks OK. Only minor things left. slow-tests-needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants