Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: introduce placeholder fast path #63807

Merged
merged 3 commits into from
Apr 21, 2021

Conversation

RaduBerinde
Copy link
Member

@RaduBerinde RaduBerinde commented Apr 16, 2021

sql: support EXPLAIN ANALYZE EXECUTE

We currently cannot EXPLAIN an EXECUTE statement (the syntax doesn't
even allow it). EXECUTE is handled with special code at the top-level,
so it's tricky to make it work with EXPLAIN. However, EXPLAIN ANALYZE
Is also handled at the top level, before the EXECUTE handling, so that
should just work.

This change allows the parsing of such statements so we can use
EXPLAIN ANALYZE EXECUTE. If EXPLAIN EXECUTE is attempted, we get a
"not supported" error (this is less cryptic than the parsing error
anyway).

Release note: None

opt: normalize In with single-element list to Eq

This commit adds a rule that normalizes a IN (b) to a = b (and
similarly NOT IN to !=).

This query confirms the equivalency holds even when NULLs are
involved:

WITH
  vals (v) AS (VALUES (1), (2), (NULL))
SELECT
  v1.v, v2.v, v1.v NOT IN (v2.v,) AS in, v1.v != v2.v AS eq
FROM
  vals AS v1, vals AS v2

   v   |  v   |  in   |  eq
-------+------+-------+--------
     1 |    1 | false | false
     1 |    2 | true  | true
     1 | NULL | NULL  | NULL
     2 |    1 | true  | true
     2 |    2 | false | false
     2 | NULL | NULL  | NULL
  NULL |    1 | NULL  | NULL
  NULL |    2 | NULL  | NULL
  NULL | NULL | NULL  | NULL

Release note: None

opt: introduce placeholder fast path

Currently running a query with placeholders happens as follows:

  • at preparation time, we build a memo with the normalized expression
    with placeholders.
  • at execution time, we copy the expression into a new memo,
    replacing placeholders with their value (AssignPlaceholders).
  • then we run exploration which performs the actual optimization.

For trivial queries like KV reads, we do too much work during
execution (profiles show it is 10-20% of the runtime for KV workloads
with high read rates).

This commit introduces the concept of a "placeholder fast path". The
idea is that we can make specific checks for simple expressions and
produce (at preparation time) a fully optimized expression (which
still depends on placeholders). For this, we introduce a new operator,
PlaceholderScan which is similar to a Scan except that it always scans
one span with the same start and end key, and the key values are child
scalar expressions (either constants or placeholders).

We use this new operator only for simple SELECTs where the filters
constrain a prefix of an index to constant values; in addition, the
index must be covering and there must be only one such index. With
these conditions, we know the optimal plan upfront.

For now, the placeholder fast path transformation is Go code; if it
gets more complicated, it should be switched to use optgen rules.

A benchmark on my machine shows a kv95 workload going from 34kiops to
38kiops with this change.

Release note: None

Informs #57223.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde
Copy link
Member Author

CC @andy-kimball @jordanlewis, thought this might put a smile on your face.

@jordanlewis
Copy link
Member

❤❤❤ amazing!

@andy-kimball
Copy link
Contributor

Really happy to see this happen. I know @nvanbenschoten has also been wanting this for a while. 10%+ boost on KV95 is a very nice result.

@nvanbenschoten
Copy link
Member

This is fantastic! Thanks for the work here @RaduBerinde.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool!

I think it'd be nice to have some execbuilder and/or logictests to ensure the new code in execbuilder/relational.go works as expected.

Reviewed 8 of 8 files at r1, 14 of 14 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/bench/bench_test.go, line 470 at r2 (raw file):

	var execMemo *memo.Memo
	if h.prepMemo.IsOptimized() {
		// No placeholders, we already did the exploration at prepare time.

nit: No placeholders or the placeholder fast path succeeded


pkg/sql/opt/xform/placeholder_fast_path.go, line 101 at r2 (raw file):

	scan, ok := sel.Input.(*memo.ScanExpr)
	if !ok {

nit: Might be faster in a some cases to check that the Select's input is a Scan and return ok=false early before looping through the Select filters.


pkg/sql/opt/xform/placeholder_fast_path.go, line 107 at r2 (raw file):

	// Check if there is exactly one covering, non-partial index with a prefix
	// matching constrainedCols. In that case, using this index is always the
	// optimal plan.

If there are multiple indexes, but at least one is unique, can't we assume that picking any of the UNIQUE indexes is optimal? This might allow for this optimization to apply in quite a few more cases. I suppose it'd be pointless in practice for a user to prefix any non-unique secondary indexes with columns in another unique index, though, so maybe not worth the added complexity...


pkg/sql/opt/xform/placeholder_fast_path.go, line 133 at r2 (raw file):

		if prefixCols.Equals(constrainedCols) {
			if foundIndex != nil {
				foundMultiple = true

Perhaps it'd be helpful to have a halting version ofiter.ForEach that stops iterating based on the callback's return value. Don't worry about it for this PR thought. You can leave a TODO(mgartner): Stop the iterator early if a second index is found. if you'd like.


pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 14 at r2 (raw file):

  INDEX (c,b,a) STORING (d),
  INDEX (d,c,a),
  INDEX (d,c,b)

nit: add a partial index and an inverted index and verify the fast path isn't generated.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Unfortunately, we can't do much stuff with placeholders in execbuilder/logictests. I will think about it. Maybe we can add some special support for EXPLAIN EXECUTE.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @RaduBerinde, and @rytaft)


pkg/sql/opt/xform/placeholder_fast_path.go, line 107 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

If there are multiple indexes, but at least one is unique, can't we assume that picking any of the UNIQUE indexes is optimal? This might allow for this optimization to apply in quite a few more cases. I suppose it'd be pointless in practice for a user to prefix any non-unique secondary indexes with columns in another unique index, though, so maybe not worth the added complexity...

We'd want to scan the index which stores less data (eg fewer columns), which gets into coster territory.


pkg/sql/opt/xform/placeholder_fast_path.go, line 133 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Perhaps it'd be helpful to have a halting version ofiter.ForEach that stops iterating based on the callback's return value. Don't worry about it for this PR thought. You can leave a TODO(mgartner): Stop the iterator early if a second index is found. if you'd like.

It might be nice if there are other usecases; I don't think this would be an important one though because this condition should be pretty rare, and all this happens only at prepare time.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for EXPLAIN ANALYZE EXECUTE and added a few tests like that. Also updated handling of partial indexes.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 14 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: add a partial index and an inverted index and verify the fast path isn't generated.

Hm.. Good point. This brings up a problem with partial indexes - we can't simply ignore them, or we may miss out on using them.

Updated to fail the fast path if there are any partial indexes that we can't exclude because the filters don't constrain any partial predicate columns. PTAL.

We currently cannot EXPLAIN an EXECUTE statement (the syntax doesn't
even allow it). EXECUTE is handled with special code at the top-level,
so it's tricky to make it work with EXPLAIN. However, EXPLAIN ANALYZE
Is also handled at the top level, before the EXECUTE handling, so that
should just work.

This change allows the parsing of such statements so we can use
EXPLAIN ANALYZE EXECUTE. If EXPLAIN EXECUTE is attempted, we get a
"not supported" error (this is less cryptic than the parsing error
anyway).

Release note: None
This commit adds a rule that normalizes `a IN (b)` to `a = b` (and
similarly `NOT IN` to `!=`).

This query confirms the equivalency holds even when NULLs are
involved:

```
WITH
  vals (v) AS (VALUES (1), (2), (NULL))
SELECT
  v1.v, v2.v, v1.v NOT IN (v2.v,) AS in, v1.v != v2.v AS eq
FROM
  vals AS v1, vals AS v2

   v   |  v   |  in   |  eq
-------+------+-------+--------
     1 |    1 | false | false
     1 |    2 | true  | true
     1 | NULL | NULL  | NULL
     2 |    1 | true  | true
     2 |    2 | false | false
     2 | NULL | NULL  | NULL
  NULL |    1 | NULL  | NULL
  NULL |    2 | NULL  | NULL
  NULL | NULL | NULL  | NULL
```

Release note: None
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: 💯

Added support for EXPLAIN ANALYZE EXECUTE and added a few tests like that.

Thanks for doing this!

Reviewed 21 of 28 files at r3, 9 of 9 files at r4, 15 of 15 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/xform/placeholder_fast_path.go, line 107 at r2 (raw file):

Previously, RaduBerinde wrote…

We'd want to scan the index which stores less data (eg fewer columns), which gets into coster territory.

👍


pkg/sql/opt/xform/placeholder_fast_path.go, line 133 at r2 (raw file):

Previously, RaduBerinde wrote…

It might be nice if there are other usecases; I don't think this would be an important one though because this condition should be pretty rare, and all this happens only at prepare time.

Ahh... makes sense.


pkg/sql/opt/xform/placeholder_fast_path.go, line 110 at r5 (raw file):

	var iter scanIndexIter
	iter.Init(

nit: you don't need iter now


pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 208 at r5 (raw file):

  b INT,
  c INT,

nit: remove blank line


pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 227 at r5 (raw file):

SELECT a, b FROM partial1 WHERE a = 0
----
placeholder-scan partial1@secondary,partial

nit: if you name the (a, b) partial index it'll be easier for a human to understand/verify the test output.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Really great to see this!! Should we add a docs-todo for the EXPLAIN EXECUTE v. EXPLAIN ANALYZE EXECUTE?

nit: in the commit/PR message: "at perparation time" -> "at preparation time"

Reviewed 28 of 28 files at r3, 9 of 9 files at r4, 15 of 15 files at r5.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @RaduBerinde)


pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r5 (raw file):

// if successful (ok=true), the scan is guaranteed never to return more results
// than maxRows.
func (b *Builder) indexConstraintMaxResults(

nit: explain why you need expr in addition to scan

(or you could just use my suggestion of tagging both expression types with Scan so you can just pass the expr)


pkg/sql/opt/exec/execbuilder/testdata/scalar, line 1204 at r4 (raw file):

│ columns: (c0)
│ estimated row count: 333 (missing stats)
│ filter: CASE WHEN (c0 IS DISTINCT FROM CAST(NULL AS DECIMAL)) OR CAST(NULL AS BOOL) THEN ARRAY[NULL] ELSE ARRAY[] END IS NULL

Any idea why the OR CAST(NULL AS BOOL) is just by itself? (not super important... just thought it was weird)


pkg/sql/opt/ops/relational.opt, line 103 at r5 (raw file):

#
# PlaceholderScan cannot have a Constraint or InvertedConstraint.
[Relational]

you could add a Scan tag here and in the Scan definition -- that would allow you to test IsScan() and then cast the private to a ScanPrivate


pkg/sql/opt/testutils/opttester/opt_tester.go, line 1112 at r5 (raw file):

			ot.appliedRules.Add(int(ruleName))
		}
	})

These two functions seem like they are probably used in a lot of places. Does it make sense to put this code in a helper function?

@RaduBerinde
Copy link
Member Author

Added docs-todo label for documenting that we support EXPLAIN ANALYZE EXECUTE, and we (still) don't support EXPLAIN EXECUTE.

Currently running a query with placeholders happens as follows:
 - at preparation time, we build a memo with the normalized expression
   *with* placeholders.
 - at execution time, we copy the expression into a new memo,
   replacing placeholders with their value (AssignPlaceholders).
 - then we run exploration which performs the actual optimization.

For trivial queries like KV reads, we do too much work during
execution (profiles show it is 10-20% of the runtime for KV workloads
with high read rates).

This commit introduces the concept of a "placeholder fast path". The
idea is that we can make specific checks for simple expressions and
produce (at preparation time) a fully optimized expression (which
still depends on placeholders). For this, we introduce a new operator,
PlaceholderScan which is similar to a Scan except that it always scans
one span with the same start and end key, and the key values are child
scalar expressions (either constants or placeholders).

We use this new operator only for simple SELECTs where the filters
constrain a prefix of an index to constant values; in addition, the
index must be covering and there must be only one such index. With
these conditions, we know the optimal plan upfront.

For now, the placeholder fast path transformation is Go code; if it
gets more complicated, it should be switched to use optgen rules.

A benchmark on my machine shows a kv95 workload going from 34kiops to
38kiops with this change.

Release note: None
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)


pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: explain why you need expr in addition to scan

(or you could just use my suggestion of tagging both expression types with Scan so you can just pass the expr)

The problem with passing the expression is that we're not passing the private in the PlaceholderScan, we are passing our manufactured private (with the constraint). I have changed it to pass the relational and required properties separately.


pkg/sql/opt/exec/execbuilder/testdata/scalar, line 1204 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Any idea why the OR CAST(NULL AS BOOL) is just by itself? (not super important... just thought it was weird)

x=x gets normalized to x is not null or null (SimplifySameVarEqualities rule). The or null is needed so the result is null as opposed to false when x is null (of course, the distinction doesn't matter inside WHEN, but the rule runs independently)


pkg/sql/opt/testutils/opttester/opt_tester.go, line 1112 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

These two functions seem like they are probably used in a lot of places. Does it make sense to put this code in a helper function?

Done (moved into makeOptimizer).

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @RaduBerinde)


pkg/sql/opt/exec/execbuilder/testdata/scalar, line 1204 at r4 (raw file):

Previously, RaduBerinde wrote…

x=x gets normalized to x is not null or null (SimplifySameVarEqualities rule). The or null is needed so the result is null as opposed to false when x is null (of course, the distinction doesn't matter inside WHEN, but the rule runs independently)

got it thanks!

@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 21, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Apr 21, 2021

Build succeeded:

@craig craig bot merged commit cc94126 into cockroachdb:master Apr 21, 2021
@RaduBerinde RaduBerinde deleted the placeholder-scan branch April 22, 2021 01:42
@RaduBerinde
Copy link
Member Author

🚨 This needs to be backported together with follow-up fix #64225. 🚨

@RaduBerinde
Copy link
Member Author

🚨 This needs to be backported together with follow-up fixes #64225 and #65018. 🚨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants