-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: calculate number of rows processed when costing joins #40248
Conversation
Do we have any kind of perf difference here? |
cf012ac
to
4c32297
Compare
None of the plans changed in TPC-C or TPC-H, so I wouldn't expect perf to change there. This change should fix some cases where we are choosing a lookup join when we shouldn't be, or where we are choosing the wrong index for the lookup join. @RaduBerinde mentioned that this is currently a problem for some FK checks, so performance should improve in those cases. |
4c32297
to
32d9973
Compare
pkg/sql/opt/xform/testdata/external/tpcc, line 199 at r1 (raw file):
I would expect this to change - it should use the FK index which has all three columns.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/xform/testdata/external/tpcc, line 735 at r1 (raw file):
└── f-k-checks ├── f-k-checks-item: order_line(ol_w_id,ol_d_id,ol_o_id) -> order(o_w_id,o_d_id,o_id) │ └── anti-join (lookup order@order_idx)
Another instance.
pkg/sql/opt/xform/testdata/external/tpcc, line 1085 at r1 (raw file):
└── f-k-checks ├── f-k-checks-item: history(h_c_w_id,h_c_d_id,h_c_id) -> customer(c_w_id,c_d_id,c_id) │ └── anti-join (lookup customer@customer_idx)
Another instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/memo/statistics_builder.go, line 2293 at r1 (raw file):
// Otherwise, we need to determine the row count of the join before the // ON conditions are applied. withoutOn := e.Memo().MemoizeLookupJoin(t.Input, nil /* on */, &t.LookupJoinPrivate)
is nil
acceptable for empty List expressions? I had vaguely thought that they should always be non-nil, but maybe it doesn't matter
also nit: s/on/On/
32d9973
to
0e15fc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem seems to be that we weren't calculating stats correctly for semi or anti joins. I've added another commit to fix those stats.
Two plans for TPC-H changed, so I need to update the stats quality tests for those. But in the mean time, I figured I would push this commit to see if there is any feedback. PTAL!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj)
pkg/sql/opt/memo/statistics_builder.go, line 2293 at r1 (raw file):
Previously, justinj (Justin Jaffray) wrote…
is
nil
acceptable for empty List expressions? I had vaguely thought that they should always be non-nil, but maybe it doesn't matteralso nit:
s/on/On/
Seems like using nil
isn't a problem, since len()
of a nil
list is 0.
What does s/on/On/
apply to? The name of the parameter in MemoizeLookupJoin
is on
, not On
...
pkg/sql/opt/xform/testdata/external/tpcc, line 199 at r1 (raw file):
Previously, RaduBerinde wrote…
I would expect this to change - it should use the FK index which has all three columns..
Done.
pkg/sql/opt/xform/testdata/external/tpcc, line 735 at r1 (raw file):
Previously, RaduBerinde wrote…
Another instance.
Done.
pkg/sql/opt/xform/testdata/external/tpcc, line 1085 at r1 (raw file):
Previously, RaduBerinde wrote…
Another instance.
Done.
a1aa32f
to
28d0a38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! The updated FK plans for TPCC look great!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @rytaft)
pkg/sql/opt/memo/statistics_builder.go, line 909 at r4 (raw file):
// ----------------------------------- s.RowCount = leftStats.RowCount * rightStats.RowCount inputRowCount := s.RowCount
This could use a comment, it's not easy to figure out what this value actually means.
pkg/sql/opt/memo/statistics_builder.go, line 915 at r4 (raw file):
// calculations. It will be fixed below. s.RowCount = leftStats.RowCount inputRowCount = s.RowCount
[nit] = leftStats.RowCount is more clear (it initially seemed like a no-op)
pkg/sql/opt/memo/statistics_builder.go, line 1003 at r4 (raw file):
} // Fix the stats for anti join.
Seems like colStatJoin
would need some of this logic too?
pkg/sql/opt/memo/statistics_builder.go, line 1006 at r4 (raw file):
switch h.joinType { case opt.AntiJoinOp, opt.AntiJoinApplyOp: s.RowCount = max(inputRowCount-s.RowCount, epsilon)
Would be clearer if we just used leftStats.RowCount - s.RowCount
directly (that's the formula one would expect)
pkg/sql/opt/memo/statistics_builder.go, line 1013 at r4 (raw file):
// Distinct count is tricky for anti-joins. This is a rough estimate, // accounting for the way distinct counts are calculated above.
[nit] can you expand on this? Not sure what aspect of the way we calculate distinct counts above this formula stems from. Also, seems like the total and semi-join row count would play a role, but can't say I can come up with a formula that's reasonable.
pkg/sql/opt/xform/coster.go, line 403 at r4 (raw file):
// Add a cost if we have to evaluate an ON condition on every row. The more // leftover conditions, the more expensive it should be. We want to // differentiate between two lookup joins where one uses only a subset of the
This comment no longer holds (your change should fix this example).
pkg/sql/opt/xform/coster.go, line 411 at r4 (raw file):
// then discarded). perRowCost += cpuCostFactor * memo.Cost(len(join.On)) // We also add a constant "setup" cost per ON condition. Without this, the
This hacky "setup" adjustment to the cost shouldn't be needed anymore. The first one (line above) shouldn't be necessary either with your change, though I don't see a problem with keeping it (especially if removing it changes plans that we're unsure about). Strictly speaking, it's correct to account for running the filters, but then the other join types should also account for that, and Select should account for it similarly (it doesn't look at how many filters we have to evaluate).
cc6dd6a
to
1014541
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the TPC-H stats quality tests. Here are the performance changes (averaged over 3 runs of each, standard deviation is less than 1% for all measurements):
Q18
----
old new change
14.36s 9.05s -36.97%
Q22
----
old new change
742.1ms 833.6ms +12.34%
As shown above, Q18 improved by 37% but Q22 regressed by 12%.
I also further improved the costing and stats of semi/anti joins.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)
pkg/sql/opt/memo/statistics_builder.go, line 909 at r4 (raw file):
Previously, RaduBerinde wrote…
This could use a comment, it's not easy to figure out what this value actually means.
Done.
pkg/sql/opt/memo/statistics_builder.go, line 915 at r4 (raw file):
Previously, RaduBerinde wrote…
[nit] = leftStats.RowCount is more clear (it initially seemed like a no-op)
Done.
pkg/sql/opt/memo/statistics_builder.go, line 1003 at r4 (raw file):
Previously, RaduBerinde wrote…
Seems like
colStatJoin
would need some of this logic too?
It shouldn't need any change since we're setting the selectivity here so it's correct for anti joins. For semi/anti joins, colStatJoin
just applies the selectivity to the given column.
pkg/sql/opt/memo/statistics_builder.go, line 1006 at r4 (raw file):
Previously, RaduBerinde wrote…
Would be clearer if we just used
leftStats.RowCount - s.RowCount
directly (that's the formula one would expect)
Done.
pkg/sql/opt/memo/statistics_builder.go, line 1013 at r4 (raw file):
Previously, RaduBerinde wrote…
[nit] can you expand on this? Not sure what aspect of the way we calculate distinct counts above this formula stems from. Also, seems like the total and semi-join row count would play a role, but can't say I can come up with a formula that's reasonable.
I just removed this -- using the selectivity is easier and seems to be about as accurate.
pkg/sql/opt/xform/coster.go, line 403 at r4 (raw file):
Previously, RaduBerinde wrote…
This comment no longer holds (your change should fix this example).
Done.
pkg/sql/opt/xform/coster.go, line 411 at r4 (raw file):
Previously, RaduBerinde wrote…
This hacky "setup" adjustment to the cost shouldn't be needed anymore. The first one (line above) shouldn't be necessary either with your change, though I don't see a problem with keeping it (especially if removing it changes plans that we're unsure about). Strictly speaking, it's correct to account for running the filters, but then the other join types should also account for that, and Select should account for it similarly (it doesn't look at how many filters we have to evaluate).
Unfortunately we still need the setup cost to handle the case of a very small row count. When the row count is less than one, we cannot calculate the selectivity of each filter, so we need this hack to discriminate between fully covering indexes and partially covering indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/exec/execbuilder/testdata/lookup_join, line 247 at r5 (raw file):
NULL NULL {5} 5 # TODO(radu): this doesn't seem to be a lookup join, but it should be.
We can remove this now :)
pkg/sql/opt/memo/statistics_builder.go, line 2358 at r6 (raw file):
} // Otherwise, we need to determine the row count of the join before the
[nit] not clear anymore what "otherwise" refers to
… joins This commit updates the costing of both lookup joins and merge joins to take into account the number of rows processed by the operator. This number may be larger than the number of output rows if an additional filter is applied as part of the ON condition that is not used to determine equality columns for the join. For example, consider the query `SELECT * FROM abc JOIN def ON a = e AND b = 3;` Assuming there is no index on b, if a lookup join is used to execute this query, the number of rows processed is actually the same as the query `SELECT * FROM abc JOIN def ON a = e;` The difference is that the filter b=3 must also be applied to every row in the first query. The coster now takes this into account when determining the cost of lookup joins and merge joins. Informs cockroachdb#34810 Release note: None
Prior to this commit, the statisticsBuilder always estimated that the number of output rows for a semi or anti join was equal to the number of rows on the left side. It ignored any ON conditions. This commit improves the estimate by taking into account the ON conditions. Release note: None
1014541
to
3d565a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj and @RaduBerinde)
pkg/sql/opt/exec/execbuilder/testdata/lookup_join, line 247 at r5 (raw file):
Previously, RaduBerinde wrote…
We can remove this now :)
Done.
pkg/sql/opt/memo/statistics_builder.go, line 2358 at r6 (raw file):
Previously, RaduBerinde wrote…
[nit] not clear anymore what "otherwise" refers to
Done.
40248: opt: calculate number of rows processed when costing joins r=rytaft a=rytaft This PR updates the costing of joins to take into account the number of rows processed by the operator. This number may be larger than the number of output rows if an additional filter is applied as part of the ON condition that is not used to determine equality columns for the join. For example, consider the query `SELECT * FROM abc JOIN def ON a = e AND b = 3;` Assuming there is no index on b, if a lookup join is used to execute this query, the number of rows processed is actually the same as the query `SELECT * FROM abc JOIN def ON a = e;` The difference is that the filter b=3 must also be applied to every row in the first query. The coster now takes this into account when determining the cost of joins. Fixes #34810 Release note: None 40431: workload: fix partition commands in tpcc import r=solongordon a=solongordon The commands for partitioning indexes in the TPCC import were erroring out due to a syntax change introduced in #39332. I updated them to use `ALTER PARTITION ... OF INDEX` rather than `ALTER PARTITION ... OF TABLE`. Fixes #39005 Fixes #40360 Fixes #40416 Release note: None Co-authored-by: Rebecca Taft <becca@cockroachlabs.com> Co-authored-by: Solon Gordon <solon@cockroachlabs.com>
Build succeeded |
This PR updates the costing of joins to take into account the number of
rows processed by the operator. This number may be larger than the
number of output rows if an additional filter is applied as part of the
ON condition that is not used to determine equality
columns for the join.
For example, consider the query
SELECT * FROM abc JOIN def ON a = e AND b = 3;
Assuming there is no index on b, if a lookup join is used to execute this
query, the number of rows processed is actually the same as the query
SELECT * FROM abc JOIN def ON a = e;
The difference is that the filter b=3 must also be applied to every row in
the first query. The coster now takes this into account when determining
the cost of joins.
Fixes #34810
Release note: None