-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-3753] Remove rule queue importance #1840
Conversation
All the plan diffs in this patch are with same cost. |
core/src/test/resources/sql/misc.iq
Outdated
@@ -473,7 +473,7 @@ EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t5)], expr#9=[IS NULL($t7) | |||
EnumerableCalc(expr#0..3=[{inputs}], expr#4=[true], deptno=[$t0], $f0=[$t4]) | |||
EnumerableTableScan(table=[[hr, depts]]) | |||
EnumerableAggregate(group=[{0}], agg#0=[MIN($1)]) | |||
EnumerableCalc(expr#0..3=[{inputs}], expr#4=[90], expr#5=[+($t0, $t4)], expr#6=[true], $f4=[$t5], $f0=[$t6]) | |||
EnumerableCalc(expr#0..3=[{inputs}], expr#4=[90], expr#5=[+($t0, $t4)], expr#6=[true], $f4=[$t5], $f0=[$t6], $condition=[$t6]) | |||
EnumerableTableScan(table=[[hr, depts]]) | |||
!plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this plan change expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since the total cost is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think optimizer latency can be benefited a lot from this patch. Any experiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet specific experiment. But the slow tests in the patch is 23m. Comparing 32m in master.
It is diffchecker. |
* <p>If false, the planner continues to fire rules until the rule queue is | ||
* empty. | ||
*/ | ||
protected boolean impatient = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way now to stop the planner search without exhaustive search space exploration? impatient
flag was pretty useful for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. impatient
doesn't guarantee to find the best plan, no one ever used it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, 'impatient' flag doesn't help to find the best plan. But it helps to interrupt the search if it takes too much time and you are ok to go ahead with a sub optimal plan. We use it for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should investigate why your planner takes too much time to generate a plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've investigated it and looks like it is very similar to the abstract converters problem. impatient
flag was quite helpful for us in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user, you can extend VolcanoPlanner and override the checkCancel method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it might help us.
Off-topic question: why do we need several VolcanoPlannerPhase
when only the one of them is used? May be we can throw it away along with the queue importance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In default implementation, there is only 1 phase used. But actually it might help to have multiple phases to split different kinds of rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still keep planning phase. In classic volcano model, there are transformation, implementation and optimization phases. The planner can do specific things for different phases. It's not used now, but we could probably extend it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @rkondakov, we should have an alternative, or I would see it as a regression, we never have a perfect planner. Instead of running into timeout, a motive control flag is more acceptable and user friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
+ " EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n" | ||
+ " EnumerableAggregate(group=[{0}])\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a plan degradation, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does have the original plan's alternative, but from cost model's perspective, the new one is a cheaper plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who will fix the costing model then?
I think it is unfair to merge a change that is not really compatible with the costing model.
If the change to optimizer requires adjustments to the costing model, then could you please do that in a single PR, so we see the net changes for both plans and the response times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is orthogonal. should be done is a separate PR. The issue of cost model exists before this change.
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3]) | ||
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a plan degradation.
For instance expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)]
is the same as null:BOOLEAN
.
Do you know the reason for this plan degradation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me put it in another way: you change the optimizer, and now it favours bad plans.
What the optimizer now does it introduces a dummy always_true filter, and it thinks the filter would reduce the number of rows and so on. It does not look like a well-behaving optimizer :-/
Even though the change reduces slow test
execution, that reduction might be the result of "skipping some rules" rather than removing importance.
So currently it looks like some rules do not fire which result in a noticeable amount of useless predicates floating around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know it is skipping some rules? Any evidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a human being, you know it is a bad plan, but the cost model thinks it is a better plan. Shouldn't you blame the cost model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. You propose a change. It results in generating bad plans, thus it introduces a technical regression. There's a technical justification for -1.
I know cost model has awful lot of inconsistencies. However, it turns out that all those 100 tiny inconsistencies cancel each other, and Calcite manages to produce "sane" plans.
Now you fix one or two such defects (which has good merit), however, the net result becomes that there are 98 inconsistencies in the cost model which no longer cancel each other.
Calcite purpose is optimizer, and it is really sad to introduce regressions to the optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vlsi I fixed the plan diffs as requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quick comment while scanning through the PR. I think that the removal of ambitious
and impatient
flags as well as the change in FilterProjectTransposeRule
are breaking changes and should be included in the release note history.md
along with the instructions on alternatives (if there are).
@zabetak Thanks for reminding. Will update it. |
@@ -801,6 +801,11 @@ public JdbcSort( | |||
offset, fetch); | |||
} | |||
|
|||
@Override public RelOptCost computeSelfCost(RelOptPlanner planner, | |||
RelMetadataQuery mq) { | |||
return super.computeSelfCost(planner, mq).multiplyBy(0.9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need a 0.9 factor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it cheaper than default sort. Same applies on GeodeSort
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird to tweak the cost to select a specific Convention and why should the JDBC
convention should be cheaper ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okey, thanks for the explanation.
* <p>If false, the planner continues to fire rules until the rule queue is | ||
* empty. | ||
*/ | ||
protected boolean impatient = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @rkondakov, we should have an alternative, or I would see it as a regression, we never have a perfect planner. Instead of running into timeout, a motive control flag is more acceptable and user friendly.
@@ -123,6 +123,7 @@ public void onMatch(RelOptRuleCall call) { | |||
// aggregate functions, add a project for the same effect. | |||
relBuilder.project(relBuilder.fields(aggregate.getGroupSet())); | |||
} | |||
call.getPlanner().setImportance(aggregate, 0d); | |||
call.transformTo(relBuilder.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this change?
* <p>It does not allow a Filter to be pushed past the Project if | ||
* {@link RexUtil#containsCorrelation there is a correlation condition}) | ||
* anywhere in the Filter, since in some cases it can prevent a | ||
* {@link org.apache.calcite.rel.core.Correlate} from being de-correlated. | ||
*/ | ||
public static final FilterProjectTransposeRule INSTANCE = | ||
new FilterProjectTransposeRule(Filter.class, Project.class, true, true, | ||
new FilterProjectTransposeRule(LogicalFilter.class, LogicalProject.class, true, true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A breaking change
@@ -431,14 +431,14 @@ join ( | |||
from "hr"."emps" | |||
window w as (partition by "deptno" order by "commission")) b | |||
on a."deptno" = b."deptno" | |||
limit 5; | |||
order by "deptno", ar, br limit 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why another sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To stablize test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calcite executes the query with single thread, so theoretically, there shouldn't be any un-stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without ordering, different join order will generate different data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But with same query and same rule sets, the rules should be fired in same sequence right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone tuned the cost model and caused a plan change, the result will be different again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not force every limit test to have an order by, can we have a more nice solution to promotion the stability, i.e. if a make this change into Flink, there would be some test fails but i would not expect to modify the queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can modify the result, instead of modifying the query.
@@ -58,7 +58,7 @@ | |||
if (fetch != null) { | |||
return cost.multiplyBy(0.05); | |||
} else { | |||
return cost; | |||
return cost.multiplyBy(0.9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
Thanks everyone for reviewing. I will merge this PR in 24 hours. |
…itutional rule first
JIRA: https://issues.apache.org/jira/browse/CALCITE-3753
Also updated test cases.