opt: introduce placeholder fast path #63807

RaduBerinde · 2021-04-16T23:56:15Z

sql: support EXPLAIN ANALYZE EXECUTE

We currently cannot EXPLAIN an EXECUTE statement (the syntax doesn't
even allow it). EXECUTE is handled with special code at the top-level,
so it's tricky to make it work with EXPLAIN. However, EXPLAIN ANALYZE
Is also handled at the top level, before the EXECUTE handling, so that
should just work.

This change allows the parsing of such statements so we can use
EXPLAIN ANALYZE EXECUTE. If EXPLAIN EXECUTE is attempted, we get a
"not supported" error (this is less cryptic than the parsing error
anyway).

Release note: None

opt: normalize In with single-element list to Eq

This commit adds a rule that normalizes a IN (b) to a = b (and
similarly NOT IN to !=).

This query confirms the equivalency holds even when NULLs are
involved:

WITH
  vals (v) AS (VALUES (1), (2), (NULL))
SELECT
  v1.v, v2.v, v1.v NOT IN (v2.v,) AS in, v1.v != v2.v AS eq
FROM
  vals AS v1, vals AS v2

   v   |  v   |  in   |  eq
-------+------+-------+--------
     1 |    1 | false | false
     1 |    2 | true  | true
     1 | NULL | NULL  | NULL
     2 |    1 | true  | true
     2 |    2 | false | false
     2 | NULL | NULL  | NULL
  NULL |    1 | NULL  | NULL
  NULL |    2 | NULL  | NULL
  NULL | NULL | NULL  | NULL

Release note: None

opt: introduce placeholder fast path

Currently running a query with placeholders happens as follows:

at preparation time, we build a memo with the normalized expression
with placeholders.
at execution time, we copy the expression into a new memo,
replacing placeholders with their value (AssignPlaceholders).
then we run exploration which performs the actual optimization.

For trivial queries like KV reads, we do too much work during
execution (profiles show it is 10-20% of the runtime for KV workloads
with high read rates).

This commit introduces the concept of a "placeholder fast path". The
idea is that we can make specific checks for simple expressions and
produce (at preparation time) a fully optimized expression (which
still depends on placeholders). For this, we introduce a new operator,
PlaceholderScan which is similar to a Scan except that it always scans
one span with the same start and end key, and the key values are child
scalar expressions (either constants or placeholders).

We use this new operator only for simple SELECTs where the filters
constrain a prefix of an index to constant values; in addition, the
index must be covering and there must be only one such index. With
these conditions, we know the optimal plan upfront.

For now, the placeholder fast path transformation is Go code; if it
gets more complicated, it should be switched to use optgen rules.

A benchmark on my machine shows a kv95 workload going from 34kiops to
38kiops with this change.

Release note: None

Informs #57223.

cockroach-teamcity · 2021-04-16T23:56:24Z

This change is

RaduBerinde · 2021-04-16T23:57:01Z

CC @andy-kimball @jordanlewis, thought this might put a smile on your face.

jordanlewis · 2021-04-17T02:27:46Z

❤❤❤ amazing!

andy-kimball · 2021-04-17T03:40:07Z

Really happy to see this happen. I know @nvanbenschoten has also been wanting this for a while. 10%+ boost on KV95 is a very nice result.

nvanbenschoten · 2021-04-17T04:09:33Z

This is fantastic! Thanks for the work here @RaduBerinde.

mgartner

Really cool!

I think it'd be nice to have some execbuilder and/or logictests to ensure the new code in execbuilder/relational.go works as expected.

Reviewed 8 of 8 files at r1, 14 of 14 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)

pkg/sql/opt/bench/bench_test.go, line 470 at r2 (raw file):

	var execMemo *memo.Memo
	if h.prepMemo.IsOptimized() {
		// No placeholders, we already did the exploration at prepare time.

nit: No placeholders or the placeholder fast path succeeded

pkg/sql/opt/xform/placeholder_fast_path.go, line 101 at r2 (raw file):

	scan, ok := sel.Input.(*memo.ScanExpr)
	if !ok {

nit: Might be faster in a some cases to check that the Select's input is a Scan and return ok=false early before looping through the Select filters.

pkg/sql/opt/xform/placeholder_fast_path.go, line 107 at r2 (raw file):

	// Check if there is exactly one covering, non-partial index with a prefix
	// matching constrainedCols. In that case, using this index is always the
	// optimal plan.

If there are multiple indexes, but at least one is unique, can't we assume that picking any of the UNIQUE indexes is optimal? This might allow for this optimization to apply in quite a few more cases. I suppose it'd be pointless in practice for a user to prefix any non-unique secondary indexes with columns in another unique index, though, so maybe not worth the added complexity...

pkg/sql/opt/xform/placeholder_fast_path.go, line 133 at r2 (raw file):

		if prefixCols.Equals(constrainedCols) {
			if foundIndex != nil {
				foundMultiple = true

Perhaps it'd be helpful to have a halting version ofiter.ForEach that stops iterating based on the callback's return value. Don't worry about it for this PR thought. You can leave a TODO(mgartner): Stop the iterator early if a second index is found. if you'd like.

pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 14 at r2 (raw file):

  INDEX (c,b,a) STORING (d),
  INDEX (d,c,a),
  INDEX (d,c,b)

nit: add a partial index and an inverted index and verify the fast path isn't generated.

RaduBerinde

TFTR!

Unfortunately, we can't do much stuff with placeholders in execbuilder/logictests. I will think about it. Maybe we can add some special support for EXPLAIN EXECUTE.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @RaduBerinde, and @rytaft)

pkg/sql/opt/xform/placeholder_fast_path.go, line 107 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

If there are multiple indexes, but at least one is unique, can't we assume that picking any of the UNIQUE indexes is optimal? This might allow for this optimization to apply in quite a few more cases. I suppose it'd be pointless in practice for a user to prefix any non-unique secondary indexes with columns in another unique index, though, so maybe not worth the added complexity...

We'd want to scan the index which stores less data (eg fewer columns), which gets into coster territory.

pkg/sql/opt/xform/placeholder_fast_path.go, line 133 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Perhaps it'd be helpful to have a halting version ofiter.ForEach that stops iterating based on the callback's return value. Don't worry about it for this PR thought. You can leave a TODO(mgartner): Stop the iterator early if a second index is found. if you'd like.

It might be nice if there are other usecases; I don't think this would be an important one though because this condition should be pretty rare, and all this happens only at prepare time.

RaduBerinde

Added support for EXPLAIN ANALYZE EXECUTE and added a few tests like that. Also updated handling of partial indexes.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)

pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 14 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: add a partial index and an inverted index and verify the fast path isn't generated.

Hm.. Good point. This brings up a problem with partial indexes - we can't simply ignore them, or we may miss out on using them.

Updated to fail the fast path if there are any partial indexes that we can't exclude because the filters don't constrain any partial predicate columns. PTAL.

We currently cannot EXPLAIN an EXECUTE statement (the syntax doesn't even allow it). EXECUTE is handled with special code at the top-level, so it's tricky to make it work with EXPLAIN. However, EXPLAIN ANALYZE Is also handled at the top level, before the EXECUTE handling, so that should just work. This change allows the parsing of such statements so we can use EXPLAIN ANALYZE EXECUTE. If EXPLAIN EXECUTE is attempted, we get a "not supported" error (this is less cryptic than the parsing error anyway). Release note: None

mgartner

💯

Added support for EXPLAIN ANALYZE EXECUTE and added a few tests like that.

Thanks for doing this!

Reviewed 21 of 28 files at r3, 9 of 9 files at r4, 15 of 15 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)

pkg/sql/opt/xform/placeholder_fast_path.go, line 107 at r2 (raw file):

Previously, RaduBerinde wrote…

We'd want to scan the index which stores less data (eg fewer columns), which gets into coster territory.

👍

pkg/sql/opt/xform/placeholder_fast_path.go, line 133 at r2 (raw file):

Previously, RaduBerinde wrote…

It might be nice if there are other usecases; I don't think this would be an important one though because this condition should be pretty rare, and all this happens only at prepare time.

Ahh... makes sense.

pkg/sql/opt/xform/placeholder_fast_path.go, line 110 at r5 (raw file):

	var iter scanIndexIter
	iter.Init(

nit: you don't need iter now

pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 208 at r5 (raw file):

  b INT,
  c INT,

nit: remove blank line

pkg/sql/opt/xform/testdata/placeholder-fast-path/scan, line 227 at r5 (raw file):

SELECT a, b FROM partial1 WHERE a = 0
----
placeholder-scan partial1@secondary,partial

nit: if you name the (a, b) partial index it'll be easier for a human to understand/verify the test output.

rytaft

Really great to see this!! Should we add a docs-todo for the EXPLAIN EXECUTE v. EXPLAIN ANALYZE EXECUTE?

nit: in the commit/PR message: "at perparation time" -> "at preparation time"

Reviewed 28 of 28 files at r3, 9 of 9 files at r4, 15 of 15 files at r5.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @RaduBerinde)

pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r5 (raw file):

// if successful (ok=true), the scan is guaranteed never to return more results
// than maxRows.
func (b *Builder) indexConstraintMaxResults(

nit: explain why you need expr in addition to scan

(or you could just use my suggestion of tagging both expression types with Scan so you can just pass the expr)

pkg/sql/opt/exec/execbuilder/testdata/scalar, line 1204 at r4 (raw file):

│ columns: (c0)
│ estimated row count: 333 (missing stats)
│ filter: CASE WHEN (c0 IS DISTINCT FROM CAST(NULL AS DECIMAL)) OR CAST(NULL AS BOOL) THEN ARRAY[NULL] ELSE ARRAY[] END IS NULL

Any idea why the OR CAST(NULL AS BOOL) is just by itself? (not super important... just thought it was weird)

pkg/sql/opt/ops/relational.opt, line 103 at r5 (raw file):

#
# PlaceholderScan cannot have a Constraint or InvertedConstraint.
[Relational]

you could add a Scan tag here and in the Scan definition -- that would allow you to test IsScan() and then cast the private to a ScanPrivate

pkg/sql/opt/testutils/opttester/opt_tester.go, line 1112 at r5 (raw file):

			ot.appliedRules.Add(int(ruleName))
		}
	})

These two functions seem like they are probably used in a lot of places. Does it make sense to put this code in a helper function?

RaduBerinde · 2021-04-20T15:59:48Z

Added docs-todo label for documenting that we support EXPLAIN ANALYZE EXECUTE, and we (still) don't support EXPLAIN EXECUTE.

Currently running a query with placeholders happens as follows: - at preparation time, we build a memo with the normalized expression *with* placeholders. - at execution time, we copy the expression into a new memo, replacing placeholders with their value (AssignPlaceholders). - then we run exploration which performs the actual optimization. For trivial queries like KV reads, we do too much work during execution (profiles show it is 10-20% of the runtime for KV workloads with high read rates). This commit introduces the concept of a "placeholder fast path". The idea is that we can make specific checks for simple expressions and produce (at preparation time) a fully optimized expression (which still depends on placeholders). For this, we introduce a new operator, PlaceholderScan which is similar to a Scan except that it always scans one span with the same start and end key, and the key values are child scalar expressions (either constants or placeholders). We use this new operator only for simple SELECTs where the filters constrain a prefix of an index to constant values; in addition, the index must be covering and there must be only one such index. With these conditions, we know the optimal plan upfront. For now, the placeholder fast path transformation is Go code; if it gets more complicated, it should be switched to use optgen rules. A benchmark on my machine shows a kv95 workload going from 34kiops to 38kiops with this change. Release note: None

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)

pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: explain why you need expr in addition to scan

(or you could just use my suggestion of tagging both expression types with Scan so you can just pass the expr)

The problem with passing the expression is that we're not passing the private in the PlaceholderScan, we are passing our manufactured private (with the constraint). I have changed it to pass the relational and required properties separately.

pkg/sql/opt/exec/execbuilder/testdata/scalar, line 1204 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Any idea why the OR CAST(NULL AS BOOL) is just by itself? (not super important... just thought it was weird)

x=x gets normalized to x is not null or null (SimplifySameVarEqualities rule). The or null is needed so the result is null as opposed to false when x is null (of course, the distinction doesn't matter inside WHEN, but the rule runs independently)

pkg/sql/opt/testutils/opttester/opt_tester.go, line 1112 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

These two functions seem like they are probably used in a lot of places. Does it make sense to put this code in a helper function?

Done (moved into makeOptimizer).

mgartner

Reviewed 5 of 5 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)

rytaft

Reviewed 5 of 5 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @RaduBerinde)

pkg/sql/opt/exec/execbuilder/testdata/scalar, line 1204 at r4 (raw file):

Previously, RaduBerinde wrote…

x=x gets normalized to x is not null or null (SimplifySameVarEqualities rule). The or null is needed so the result is null as opposed to false when x is null (of course, the distinction doesn't matter inside WHEN, but the rule runs independently)

got it thanks!

RaduBerinde · 2021-04-20T22:33:22Z

bors r+

craig · 2021-04-21T00:54:20Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2021-04-21T03:01:11Z

Build succeeded:

GitHub CI (Cockroach)

RaduBerinde · 2021-05-02T20:34:48Z

🚨 This needs to be backported together with follow-up fix #64225. 🚨

RaduBerinde · 2021-05-11T20:54:47Z

🚨 This needs to be backported together with follow-up fixes #64225 and #65018. 🚨

RaduBerinde requested review from mgartner and rytaft April 16, 2021 23:56

RaduBerinde force-pushed the placeholder-scan branch from b9388f1 to 92ac9e5 Compare April 17, 2021 00:09

mgartner reviewed Apr 19, 2021

View reviewed changes

RaduBerinde commented Apr 19, 2021

View reviewed changes

RaduBerinde force-pushed the placeholder-scan branch from 92ac9e5 to 6bf6e77 Compare April 19, 2021 23:01

RaduBerinde commented Apr 19, 2021

View reviewed changes

RaduBerinde added 2 commits April 19, 2021 21:35

RaduBerinde force-pushed the placeholder-scan branch from 6bf6e77 to 34d57a0 Compare April 20, 2021 04:39

mgartner approved these changes Apr 20, 2021

View reviewed changes

rytaft approved these changes Apr 20, 2021

View reviewed changes

RaduBerinde added the docs-todo label Apr 20, 2021

RaduBerinde force-pushed the placeholder-scan branch from 34d57a0 to df67aa9 Compare April 20, 2021 16:33

RaduBerinde commented Apr 20, 2021

View reviewed changes

mgartner approved these changes Apr 20, 2021

View reviewed changes

rytaft approved these changes Apr 20, 2021

View reviewed changes

craig bot merged commit cc94126 into cockroachdb:master Apr 21, 2021

RaduBerinde deleted the placeholder-scan branch April 22, 2021 01:42

erikgrinaker mentioned this pull request Apr 26, 2021

opt: performance regression on April 21st #64214

Closed

RaduBerinde added the backport-21.1.x label May 2, 2021

rytaft mentioned this pull request Jul 13, 2021

console: want better observability for queries executed with multiple plans #67519

Open

nvanbenschoten mentioned this pull request Dec 30, 2021

opt: avoid re-constructing entire memo on Placeholder injection? #57223

Closed

nvanbenschoten mentioned this pull request Feb 28, 2022

opt: avoid re-constructing entire memo on mutation Placeholder injection #77128

Closed

DrewKimball mentioned this pull request Jun 27, 2023

opt: make the placeholder fast-path into an optgen rule #104760

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: introduce placeholder fast path #63807

opt: introduce placeholder fast path #63807

RaduBerinde commented Apr 16, 2021 •

edited

Loading

cockroach-teamcity commented Apr 16, 2021

RaduBerinde commented Apr 16, 2021

jordanlewis commented Apr 17, 2021

andy-kimball commented Apr 17, 2021

nvanbenschoten commented Apr 17, 2021

mgartner left a comment

RaduBerinde left a comment

RaduBerinde left a comment

mgartner left a comment

rytaft left a comment

RaduBerinde commented Apr 20, 2021

RaduBerinde left a comment

mgartner left a comment

rytaft left a comment

RaduBerinde commented Apr 20, 2021

craig bot commented Apr 21, 2021

craig bot commented Apr 21, 2021

RaduBerinde commented May 2, 2021

RaduBerinde commented May 11, 2021

opt: introduce placeholder fast path #63807

opt: introduce placeholder fast path #63807

Conversation

RaduBerinde commented Apr 16, 2021 • edited Loading

sql: support EXPLAIN ANALYZE EXECUTE

opt: normalize In with single-element list to Eq

opt: introduce placeholder fast path

cockroach-teamcity commented Apr 16, 2021

RaduBerinde commented Apr 16, 2021

jordanlewis commented Apr 17, 2021

andy-kimball commented Apr 17, 2021

nvanbenschoten commented Apr 17, 2021

mgartner left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

RaduBerinde commented Apr 20, 2021

RaduBerinde left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

RaduBerinde commented Apr 20, 2021

craig bot commented Apr 21, 2021

craig bot commented Apr 21, 2021

RaduBerinde commented May 2, 2021

RaduBerinde commented May 11, 2021

RaduBerinde commented Apr 16, 2021 •

edited

Loading