-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt/rowexec: support range lookup joins on input columns #85597
Conversation
if !startInclusive && !isDesc { | ||
// We can make the key inclusive by incrementing to the key that sorts | ||
// immediately after it. | ||
startKey = startKey.PrefixEnd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to draw attention to this bit here - does it work to call PrefixEnd
in order to make the start key inclusive? Calling Key.Next
didn't seem to properly filter out the previous value when I tried it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I noticed that #51576 is not being closed by this PR - what are the remaining cases? The limitation on types that don't support Datum.Prev
?
Reviewed 5 of 5 files at r1, 16 of 16 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)
-- commits
line 31 at r2:
Looks like INetFamily
satisfies these conditions too. Also, what about FloatFamily
- we can't use it because DFloat.Prev
returns an error with edge cases like infinity and NaN?
TimeFamily
, TimeTZFamily
, TimestampFamily
, and TimestampTZFamily
return an error only if IsMin
is true
- but the contract of Datum.Prev
is such that its behavior is undefined if IsMin
returns true
. This seems similar to Ints and Oids, so I think we should allow them.
pkg/sql/opt/lookupjoin/constraint_builder.go
line 137 at r2 (raw file):
func (b *ConstraintBuilder) Build( index cat.Index, onFilters, optionalFilters memo.FiltersExpr, ) (cons Constraint, foundEqualityCols bool) {
nit: looks like cons
is never used, so maybe use s/cons/_/
?
pkg/sql/opt/lookupjoin/constraint_builder.go
line 525 at r2 (raw file):
// see makeConstFilter). We only support 1 span in the execution // engine so check that. Additionally, inequality filter constraints // should be normalized so that the column is ascending, so we can ignore
Where does this normalization take place?
pkg/sql/rowexec/joinreader_span_generator.go
line 110 at r2 (raw file):
) (spanID int, newSpanKey bool) { // Derive a unique key for the span. spanKey := fmt.Sprintf("%d/%d/%s/%s",
It would be good to run the microbenchmarks of the join reader.
Also, in f0639cb I added the following optimization:
// This way of constructing a string is more efficient than using
// fmt.Sprintf.
return opName + "-" + strconv.Itoa(int(processorID)) + "-" + suffix + "-" + strconv.Itoa(len(r.monitors))
so we probably want to avoid using fmt.Sprintf
here too.
I think we can also drop the length of one of the keys from the encoding scheme to reduce the memory footprint without sacrificing the uniqueness property.
pkg/sql/span/span_builder.go
line 146 at r2 (raw file):
startKey, startContainsNull, err = makeKeyFromRow(values[:prefixLen+1]) if !startInclusive && !isDesc { // We can make the key inclusive by incrementing to the key that sorts
nit: s/inclusive/exclusive/
.
pkg/sql/span/span_builder.go
line 148 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I'd like to draw attention to this bit here - does it work to call
PrefixEnd
in order to make the start key inclusive? CallingKey.Next
didn't seem to properly filter out the previous value when I tried it.
What is a repro when Key.Next
doesn't work?
I feel like PrefixEnd
is the right thing to call here. Let's consider an example: say we have a > 1
inequality. The start key would be something like /TableID/IndexID/1
. Calling Key.Next
would return something like /TableID/IndexID/1/0
which could still fetch values where a = 1
, but we don't want those. What we really want is /TableID/IndexID/2
, and this is exactly what PrefixEnd
gives us. I believe that here PrefixEnd
works exactly like Datum.Next
would have worked for those types that support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the remaining cases? The limitation on types that don't support Datum.Prev?
I'm more concerned about the fact that we have to limit the cases when we plan inequality lookup joins because we aren't de-duplicating spans. But I guess it might be better to make that its own issue - what do you think?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Looks like
INetFamily
satisfies these conditions too. Also, what aboutFloatFamily
- we can't use it becauseDFloat.Prev
returns an error with edge cases like infinity and NaN?
TimeFamily
,TimeTZFamily
,TimestampFamily
, andTimestampTZFamily
return an error only ifIsMin
istrue
- but the contract ofDatum.Prev
is such that its behavior is undefined ifIsMin
returnstrue
. This seems similar to Ints and Oids, so I think we should allow them.
Good point, added all of those types. Interestingly, it looks like the minimum value for float is NaN
, so the code doesn't need any special casing for floats.
pkg/sql/opt/lookupjoin/constraint_builder.go
line 137 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: looks like
cons
is never used, so maybe uses/cons/_/
?
Done.
pkg/sql/opt/lookupjoin/constraint_builder.go
line 525 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Where does this normalization take place?
It's constructed that way:
keyCtx.Columns.InitSingle(opt.MakeOrderingColumn(col, false /* descending */)) |
I've added some clarification and a pointer to the relevant method in the comment.
pkg/sql/rowexec/joinreader_span_generator.go
line 110 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
It would be good to run the microbenchmarks of the join reader.
Also, in f0639cb I added the following optimization:
// This way of constructing a string is more efficient than using // fmt.Sprintf. return opName + "-" + strconv.Itoa(int(processorID)) + "-" + suffix + "-" + strconv.Itoa(len(r.monitors))
so we probably want to avoid using
fmt.Sprintf
here too.I think we can also drop the length of one of the keys from the encoding scheme to reduce the memory footprint without sacrificing the uniqueness property.
Done.
pkg/sql/span/span_builder.go
line 146 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/inclusive/exclusive/
.
Changed the comment.
pkg/sql/span/span_builder.go
line 148 at r2 (raw file):
I think this test failed when I tried Key.Next
:
cockroach/pkg/sql/opt/exec/execbuilder/testdata/lookup_join_trace
Lines 135 to 151 in d4f2c41
statement ok | |
SET tracing = on, kv; | |
SELECT * | |
FROM metric_values as v | |
INNER JOIN metrics as m | |
ON metric_id=id | |
WHERE | |
time > '2020-01-01 00:00:00+00:00' AND | |
name='cpu' | |
ORDER BY value; | |
SET tracing = off; | |
# We should not be fetching the key with '2020-01-01 00:00:00+00:00' value for | |
# the 'time' column since it doesn't satisfy the filter on 'time'. | |
query T | |
SELECT message FROM [SHOW KV TRACE FOR SESSION] WHERE operation = 'join reader' AND message NOT LIKE 'querying next range%' | |
---- |
Calling Key.Next would return something like /TableID/IndexID/1/0 which could still fetch values where a = 1, but we don't want those.
Yeah, that makes sense. Thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cases when we plan inequality lookup joins because we aren't de-duplicating spans. But I guess it might be better to make that its own issue - what do you think?
Yeah, a separate issue makes more sense to me.
Reviewed 16 of 16 files at r3, 21 of 21 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/lookupjoin/constraint_builder.go
line 506 at r4 (raw file):
} } return
nit: I think we have a guideline to explicitly spell out named return arguments.
pkg/sql/opt/lookupjoin/constraint_builder.go
line 550 at r4 (raw file):
// // This limitation comes from the execution that must be able to // "advance" the start boundary. In the case when the index column is
nit: the first sentence needs an update.
pkg/sql/rowexec/joinreader_span_generator.go
line 338 at r4 (raw file):
resizeMemoryAccount resizeMemoryAccountFunc alloc *tree.DatumAlloc
nit: I don't think it's used anywhere.
pkg/sql/rowexec/joinreader_span_generator.go
line 470 at r4 (raw file):
// lookupExprHasVarInequality returns true if the given lookup expression // contains an inequality that references an input column. func (g *multiSpanGenerator) lookupExprHasVarInequality(lookupExpr tree.TypedExpr) bool {
nit: looks like g
is never actually used, so we can make this method without the receiver.
pkg/sql/span/span_builder.go
line 179 at r4 (raw file):
} } startKey.Next()
nit: looks like leftover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1, 16 of 16 files at r3, 17 of 21 files at r4, 21 of 21 files at r5, 21 of 21 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
-- commits
line 18 at r4:
nit: I'd say opt, rowexec: ...
, since they are different paths
pkg/sql/opt/lookupjoin/constraint_builder.go
line 148 at r4 (raw file):
// Retrieve the inequality columns from onFilters. _, rightCmp, inEqFilterOrds := memo.ExtractJoinConditionColumns(
nit: for clarity, I'd call this inequalityFilterOrds
or something like that instead
pkg/sql/opt/lookupjoin/constraint_builder.go
line 462 at r4 (raw file):
func (b *ConstraintBuilder) findJoinVariableRangeFilters( rightCmp opt.ColList, inEqFilterOrds []int,
nit: inEqFilterOrds -> comparisonFilterOrds?
pkg/sql/opt/lookupjoin/constraint_builder.go
line 566 at r4 (raw file):
if rangeExpr, ok := filters[i].Condition.(*memo.RangeExpr); ok && (!needStart || !needEnd) { // We need to split the range filter because only half of it is needed. and := rangeExpr.And.(*memo.AndExpr)
I don't think it's always possible to split a RangeExpr this way. I think this could be an arbitrarily deeply nested And tree.
Do you know that in this case it will be possible?
pkg/sql/opt/memo/extract.go
line 250 at r4 (raw file):
leftCols, rightCols opt.ColSet, condition opt.ScalarExpr, ) (ok bool, left, right opt.ColumnID) { return ExtractJoinCondition(leftCols, rightCols, condition, false /* allowInequality */)
nit: allowInequality -> inequality
pkg/sql/opt/memo/extract.go
line 294 at r4 (raw file):
var newFilters FiltersExpr for i := range on { leftVar, rightVar, ok := isVarEqualityOrInequality(on[i].Condition, false /* allowInequality */)
nit: allowInequality -> inequality
pkg/sql/opt/norm/join_funcs.go
line 469 at r3 (raw file):
// - neither a or b are constants. // // Such an equality can be converted to a column equality by pushing down
nit: equality -> comparison (2x)
pkg/sql/opt/norm/join_funcs.go
line 474 at r3 (raw file):
a, b opt.ScalarExpr, leftCols, rightCols opt.ColSet, ) bool { // Disallow simple equality between variables.
nit: equality -> comparison
pkg/sql/opt/norm/join_funcs.go
line 499 at r3 (raw file):
(leftProps.OuterCols.SubsetOf(rightCols) && rightProps.OuterCols.SubsetOf(leftCols)) { // The equality is of the form: // expression(leftCols) = expression(rightCols)
nit: equality -> comparison
= -> op
pkg/sql/opt/norm/join_funcs.go
line 528 at r4 (raw file):
if cmpLeftProps.OuterCols.SubsetOf(rightCols) { a, b = b, a op = opt.CommuteEqualityOrInequalityOp(op)
Can you apply this change back to your first commit?
pkg/sql/opt/xform/join_funcs.go
line 324 at r4 (raw file):
// canGenerateLookupJoins makes a best-effort to filter out cases where no // joins can be constructed based the join's filters and flags. It may miss some
nit: based -> based on
Previously, `ExtractJoinEqualities` would match equalities between non-constant, non-variable expressions and project the expressions so that the comparison would be between variables instead. This may allow rules like `GenerateLookupJoins` to use the equalities later. This commit modifies `ExtractJoinEqualities` (now `ExtractJoinComparisons`) so that it also matches inequalities. This is useful because lookup joins can now use inequalities for lookups, and converting inequalties to reference variables instead of complex expressions increases the likelihood that an inequality can be used in a join. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @rytaft, and @yuzefovich)
Previously, rytaft (Rebecca Taft) wrote…
nit: I'd say
opt, rowexec: ...
, since they are different paths
Done.
pkg/sql/opt/lookupjoin/constraint_builder.go
line 148 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: for clarity, I'd call this
inequalityFilterOrds
or something like that instead
Done.
pkg/sql/opt/lookupjoin/constraint_builder.go
line 462 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: inEqFilterOrds -> comparisonFilterOrds?
I changed to inequalityFilterOrds
as above. Done.
pkg/sql/opt/lookupjoin/constraint_builder.go
line 506 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I think we have a guideline to explicitly spell out named return arguments.
Done.
pkg/sql/opt/lookupjoin/constraint_builder.go
line 550 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: the first sentence needs an update.
Done.
pkg/sql/opt/lookupjoin/constraint_builder.go
line 566 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I don't think it's always possible to split a RangeExpr this way. I think this could be an arbitrarily deeply nested And tree.
Do you know that in this case it will be possible?
Oh, I wasn't aware that a RangeExpr
could have more than two conditions in it. We restrict to the case where only a single column is constrained on a single span, but I think it could be possible to have redundant filters in the range, like inequalities and a LIKE
operator that filter in the same way.
I've modified the logic to just construct new lookup and remaining filters using the spans directly in the case when only part of the filter is needed. Since we don't consider filters that aren't tightly constrained by a single span, we only have to consider the case when a filter supplies start and end bounds, but only one of those is needed.
pkg/sql/opt/memo/extract.go
line 250 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: allowInequality -> inequality
Done.
pkg/sql/opt/memo/extract.go
line 294 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: allowInequality -> inequality
Done.
pkg/sql/opt/norm/join_funcs.go
line 469 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: equality -> comparison (2x)
Done.
pkg/sql/opt/norm/join_funcs.go
line 474 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: equality -> comparison
Done.
pkg/sql/opt/norm/join_funcs.go
line 499 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: equality -> comparison
= -> op
Done.
pkg/sql/opt/norm/join_funcs.go
line 528 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Can you apply this change back to your first commit?
Oh, didn't realize I hadn't done so. Done.
pkg/sql/opt/xform/join_funcs.go
line 324 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: based -> based on
Done.
pkg/sql/rowexec/joinreader_span_generator.go
line 338 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I don't think it's used anywhere.
Oops, done.
pkg/sql/rowexec/joinreader_span_generator.go
line 470 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: looks like
g
is never actually used, so we can make this method without the receiver.
Done.
pkg/sql/span/span_builder.go
line 179 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: looks like leftover.
Done.
c9b4ab8
to
7985225
Compare
Previously, it was possible to perform lookup joins using inequality conditions between index columns and constant values. This commit allows lookup joins to also use inequalities between index columns and input columns. There are restrictions on when an inequality can be used in a lookup join: 1. The left and right sides of the inequality must have identical types. 2. The inequality is between an index column and input column (or constant). 3. If the index column is `DESC` and the inequality is of the form `idxCol < inputCol`, the column type must support `Datum.Prev` without any chance of failing other than for the minimum value for that type. Condition (3) is necessary because when the index column is `DESC`, the `idxCol < inputCol` filter will be used in forming the start key of each span. The spans are expected to be inclusive, so the value of inputCol will have to be decremented to the value that orders immediately before it. Unlike the case of retrieving the next possible key (ex: `ASC` index with `idxCol > inputCol`) it is not possible in general to directly obtain the immediate previous key, because it would have an infinite number of `0xff` bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality bound before adding it to the start key. Additionally, this commit allows lookup joins to be planned without equality filters when the following conditions are met: 1. There is an inequality filter between an index column and an input column that can be used to perform lookups. 2. Either the input has only one row or the join has a LOOKUP hint. These restrictions ensure that planning lookup joins in more cases does not lead to performance regressions, since the current execution logic does not fully de-duplicate spans when inequalities are used. Fixes cockroachdb#51576 Release note (performance improvement): The execution engine can now perform lookup joins in more cases. This can significantly improve join performance when there is a large table with an index that conforms to the join ON conditions, as well as allow joins to halt early in the presence of a limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 19 of 19 files at r7, 19 of 19 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @yuzefovich)
pkg/sql/opt/lookupjoin/constraint_builder.go
line 595 at r8 (raw file):
} else if !needEnd { rangeFilter, remaining = &startFilter, &endFilter }
should there be an else
with an assertion failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @rytaft, and @yuzefovich)
pkg/sql/opt/lookupjoin/constraint_builder.go
line 595 at r8 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
should there be an
else
with an assertion failure?
We already check if !needStart || !needEnd
in the enclosing if
statement, so I think it's ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @yuzefovich)
bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
This commit adds a session setting (default on) that can be toggled to off in order to disable the lookup join behavior added in cockroachdb#85597. This will provide a quick route to mitigate any customer issues that might be caused by cockroachdb#85597. Release note: None
This commit adds a session setting (default on) that can be toggled to off in order to disable the lookup join behavior added in cockroachdb#85597. This will provide a quick route to mitigate any customer issues that might be caused by cockroachdb#85597. Release note: None
This commit adds a session setting (default on) that can be toggled to off in order to disable the lookup join behavior added in cockroachdb#85597. This will provide a quick route to mitigate any customer issues that might be caused by cockroachdb#85597. Release note: None
88840: asim: increase sim speed bench requirement r=lidorcarmel a=kvoli This patch increases the speed requirement for `TestAllocatorSimulatorSpeed` from 25s to 30s. It was showing flakes in some CI runs. resolves: #89089 Release note: None 88884: opt: add session setting to disable inequality lookup joins r=DrewKimball a=DrewKimball This commit adds a session setting (default on) that can be toggled to off in order to disable the lookup join behavior added in #85597. This will provide a quick route to mitigate any customer issues that might be caused by #85597. Release note: None Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: DrewKimball <drewk@cockroachlabs.com>
This commit adds a session setting (default on) that can be toggled to off in order to disable the lookup join behavior added in cockroachdb#85597. This will provide a quick route to mitigate any customer issues that might be caused by cockroachdb#85597. Release note: None
This commit adds a session setting (default on) that can be toggled to off in order to disable the lookup join behavior added in cockroachdb#85597. This will provide a quick route to mitigate any customer issues that might be caused by cockroachdb#85597. Release note: None
opt/rowexec: support range lookup joins on input columns
Previously, it was possible to perform lookup joins using inequality
conditions between index columns and constant values. This commit allows
lookup joins to also use inequalities between index columns and input columns.
There are restrictions on when an inequality can be used in a lookup join:
DESC
and the inequality is of the formidxCol < inputCol
, the column type must supportDatum.Prev
withoutany chance of failing other than for the minimum value for that type.
Condition (3) is necessary because when the index column is
DESC
, theidxCol < inputCol
filter will be used in forming the start key of each span.The spans are expected to be inclusive, so the value of inputCol will have to
be decremented to the value that orders immediately before it.
Unlike the case of retrieving the next possible key (ex:
ASC
index withidxCol > inputCol
) it is not possible in general to directly obtain theimmediate previous key, because it would have an infinite number of
0xff
bytes appended to it. Thus, we have to use
Datum.Prev
on the inequalitybound before adding it to the start key.
Additionally, this commit allows lookup joins to be planned without equality
filters when the following conditions are met:
that can be used to perform lookups.
These restrictions ensure that planning lookup joins in more cases does not
lead to performance regressions, since the current execution logic does not
fully de-duplicate spans when inequalities are used.
Fixes #51576
Release note (performance improvement): The execution engine can now perform
lookup joins in more cases. This can significantly improve join performance
when there is a large table with an index that conforms to the join ON
conditions, as well as allow joins to halt early in the presence of a limit.
opt: extend ExtractJoinEqualities to handle inequalities
Previously,
ExtractJoinEqualities
would match equalities betweennon-constant, non-variable expressions and project the expressions so that
the comparison would be between variables instead. This may allow rules
like
GenerateLookupJoins
to use the equalities later.This commit modifies
ExtractJoinEqualities
(nowExtractJoinComparisons
)so that it also matches inequalities. This is useful because lookup joins
can now use inequalities for lookups, and converting inequalties to
reference variables instead of complex expressions increases the likelihood
that an inequality can be used in a join.
Release note: None