Skip to content

Commit

Permalink
sql: implement a fast compressed logical plan mechanism
Browse files Browse the repository at this point in the history
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF.   EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN

Fixes: cockroachdb#63885
  • Loading branch information
cucaroach committed Oct 7, 2021
1 parent 9208567 commit f7bb012
Show file tree
Hide file tree
Showing 36 changed files with 2,787 additions and 128 deletions.
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -890,7 +890,8 @@ OPTGEN_TARGETS = \
pkg/sql/opt/rule_name.og.go \
pkg/sql/opt/rule_name_string.go \
pkg/sql/opt/exec/factory.og.go \
pkg/sql/opt/exec/explain/explain_factory.og.go
pkg/sql/opt/exec/explain/explain_factory.og.go \
pkg/sql/opt/exec/explain/plan_gist_factory.og.go \

# removed-files is a list of files that used to exist in the
# repository that need to be explicitly cleaned up to prevent build
Expand Down Expand Up @@ -1651,6 +1652,9 @@ pkg/sql/opt/exec/factory.og.go: $(optgen-defs) $(optgen-exec-defs) bin/optgen
pkg/sql/opt/exec/explain/explain_factory.og.go: $(optgen-defs) $(optgen-exec-defs) bin/optgen
optgen -out $@ execexplain $(optgen-exec-defs)

pkg/sql/opt/exec/explain/plan_gist_factory.og.go: $(optgen-defs) $(optgen-exec-defs) bin/optgen
optgen -out $@ execplangist $(optgen-exec-defs)

.PHONY: clean-c-deps
clean-c-deps:
rm -rf $(JEMALLOC_DIR)
Expand Down
2 changes: 2 additions & 0 deletions docs/generated/sql/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2515,6 +2515,8 @@ The swap_ordinate_string parameter is a 2-character string naming the ordinates
</span></td></tr>
<tr><td><a name="convert_to"></a><code>convert_to(str: <a href="string.html">string</a>, enc: <a href="string.html">string</a>) &rarr; <a href="bytes.html">bytes</a></code></td><td><span class="funcdesc"><p>Encode the string <code>str</code> as a byte array using encoding <code>enc</code>. Supports encodings ‘UTF8’ and ‘LATIN1’.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.decode_plan_gist"></a><code>crdb_internal.decode_plan_gist(gist: <a href="string.html">string</a>) &rarr; <a href="string.html">string</a></code></td><td><span class="funcdesc"><p>Returns rows of output similar to EXPLAIN from a gist created by EXPLAIN (GIST)</p>
</span></td></tr>
<tr><td><a name="crdb_internal.show_create_all_tables"></a><code>crdb_internal.show_create_all_tables(database_name: <a href="string.html">string</a>) &rarr; <a href="string.html">string</a></code></td><td><span class="funcdesc"><p>Returns rows of CREATE table statements followed by
ALTER table statements that add table constraints. The rows are ordered
by dependencies. All foreign keys are added after the creation of the table
Expand Down
3 changes: 1 addition & 2 deletions pkg/sql/distsql_spec_exec_factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -806,8 +806,7 @@ func (e *distSQLSpecExecFactory) ConstructExplain(
// We cannot create the explained plan in the same PlanInfrastructure with the
// "outer" plan. Create a separate factory.
newFactory := newDistSQLSpecExecFactory(e.planner, e.planningMode)
explainFactory := explain.NewFactory(newFactory)
plan, err := buildFn(explainFactory)
plan, err := buildFn(newFactory)
// Release the resources acquired during the physical planning right away.
newFactory.(*distSQLSpecExecFactory).planCtx.getCleanupFunc()()
if err != nil {
Expand Down
147 changes: 76 additions & 71 deletions pkg/sql/explain_plan.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,86 +50,91 @@ func (e *explainPlanNode) startExec(params runParams) error {
ob := explain.NewOutputBuilder(e.flags)
plan := e.plan.WrappedPlan.(*planComponents)

// Determine the "distribution" and "vectorized" values, which we will emit as
// special rows.

// Note that we delay adding the annotation about the distribution until
// after the plan is finalized (when the physical plan is successfully
// created).
distribution := getPlanDistribution(
params.ctx, params.p, params.extendedEvalCtx.ExecCfg.NodeID,
params.extendedEvalCtx.SessionData().DistSQLMode, plan.main,
)

outerSubqueries := params.p.curPlan.subqueryPlans
distSQLPlanner := params.extendedEvalCtx.DistSQLPlanner
planCtx := newPlanningCtxForExplainPurposes(distSQLPlanner, params, plan.subqueryPlans, distribution)
defer func() {
planCtx.planner.curPlan.subqueryPlans = outerSubqueries
}()
physicalPlan, err := newPhysPlanForExplainPurposes(planCtx, distSQLPlanner, plan.main)
var diagramURL url.URL
var diagramJSON string
if err != nil {
if e.options.Mode == tree.ExplainDistSQL {
if len(plan.subqueryPlans) > 0 {
return errors.New("running EXPLAIN (DISTSQL) on this query is " +
"unsupported because of the presence of subqueries")
}
return err
}
ob.AddDistribution(distribution.String())
// For regular EXPLAIN, simply skip emitting the "vectorized" information.
var rows []string
if e.options.Mode == tree.ExplainGist {
// RFC: e.plan.Gist should never be nil but should I check it an return an
// error anyways?
rows = []string{e.plan.Gist.String()}
} else {
// There might be an issue making the physical plan, but that should not
// cause an error or panic, so swallow the error. See #40677 for example.
distSQLPlanner.finalizePlanWithRowCount(planCtx, physicalPlan, plan.mainRowCount)
ob.AddDistribution(physicalPlan.Distribution.String())
flows := physicalPlan.GenerateFlowSpecs()
flowCtx := newFlowCtxForExplainPurposes(planCtx, params.p)

ctxSessionData := flowCtx.EvalCtx.SessionData()
var willVectorize bool
if ctxSessionData.VectorizeMode == sessiondatapb.VectorizeOff {
willVectorize = false
// Determine the "distribution" and "vectorized" values, which we will emit as
// special rows.

// Note that we delay adding the annotation about the distribution until
// after the plan is finalized (when the physical plan is successfully
// created).
distribution := getPlanDistribution(
params.ctx, params.p, params.extendedEvalCtx.ExecCfg.NodeID,
params.extendedEvalCtx.SessionData().DistSQLMode, plan.main,
)

outerSubqueries := params.p.curPlan.subqueryPlans
distSQLPlanner := params.extendedEvalCtx.DistSQLPlanner
planCtx := newPlanningCtxForExplainPurposes(distSQLPlanner, params, plan.subqueryPlans, distribution)
defer func() {
planCtx.planner.curPlan.subqueryPlans = outerSubqueries
}()
physicalPlan, err := newPhysPlanForExplainPurposes(planCtx, distSQLPlanner, plan.main)
var diagramURL url.URL
var diagramJSON string
if err != nil {
if e.options.Mode == tree.ExplainDistSQL {
if len(plan.subqueryPlans) > 0 {
return errors.New("running EXPLAIN (DISTSQL) on this query is " +
"unsupported because of the presence of subqueries")
}
return err
}
ob.AddDistribution(distribution.String())
// For regular EXPLAIN, simply skip emitting the "vectorized" information.
} else {
willVectorize = true
for _, flow := range flows {
if err := colflow.IsSupported(ctxSessionData.VectorizeMode, flow); err != nil {
willVectorize = false
break
// There might be an issue making the physical plan, but that should not
// cause an error or panic, so swallow the error. See #40677 for example.
distSQLPlanner.FinalizePlan(planCtx, physicalPlan)
flows := physicalPlan.GenerateFlowSpecs()
flowCtx := newFlowCtxForExplainPurposes(planCtx, params.p)

ctxSessionData := flowCtx.EvalCtx.SessionData()
var willVectorize bool
if ctxSessionData.VectorizeMode == sessiondatapb.VectorizeOff {
willVectorize = false
} else {
willVectorize = true
for _, flow := range flows {
if err := colflow.IsSupported(ctxSessionData.VectorizeMode, flow); err != nil {
willVectorize = false
break
}
}
}
}
ob.AddVectorized(willVectorize)
ob.AddVectorized(willVectorize)

if e.options.Mode == tree.ExplainDistSQL {
flags := execinfrapb.DiagramFlags{
ShowInputTypes: e.options.Flags[tree.ExplainFlagTypes],
}
diagram, err := execinfrapb.GeneratePlanDiagram(params.p.stmt.String(), flows, flags)
if err != nil {
return err
}
if e.options.Mode == tree.ExplainDistSQL {
flags := execinfrapb.DiagramFlags{
ShowInputTypes: e.options.Flags[tree.ExplainFlagTypes],
}
diagram, err := execinfrapb.GeneratePlanDiagram(params.p.stmt.String(), flows, flags)
if err != nil {
return err
}

diagramJSON, diagramURL, err = diagram.ToURL()
if err != nil {
return err
diagramJSON, diagramURL, err = diagram.ToURL()
if err != nil {
return err
}
}
}
}

var rows []string
if e.options.Flags[tree.ExplainFlagJSON] {
// For the JSON flag, we only want to emit the diagram JSON.
rows = []string{diagramJSON}
} else {
if err := emitExplain(ob, params.EvalContext(), params.p.ExecCfg().Codec, e.plan); err != nil {
return err
}
rows = ob.BuildStringRows()
if e.options.Mode == tree.ExplainDistSQL {
rows = append(rows, "", fmt.Sprintf("Diagram: %s", diagramURL.String()))
if e.options.Flags[tree.ExplainFlagJSON] {
// For the JSON flag, we only want to emit the diagram JSON.
rows = []string{diagramJSON}
} else {
if err := emitExplain(ob, params.EvalContext(), params.p.ExecCfg().Codec, e.plan); err != nil {
return err
}
rows = ob.BuildStringRows()
if e.options.Mode == tree.ExplainDistSQL {
rows = append(rows, "", fmt.Sprintf("Diagram: %s", diagramURL.String()))
}
}
}
v := params.p.newContainerValuesNode(colinfo.ExplainPlanColumns, 0)
Expand Down
5 changes: 5 additions & 0 deletions pkg/sql/faketreeeval/evalctx.go
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,11 @@ func (*DummyEvalPlanner) ExternalWriteFile(ctx context.Context, uri string, cont
return errors.WithStack(errEvalPlanner)
}

// DecodeGist is part of the EvalPlanner interface.
func (*DummyEvalPlanner) DecodeGist(gist string) ([]string, error) {
return nil, errors.WithStack(errEvalPlanner)
}

var _ tree.EvalPlanner = &DummyEvalPlanner{}

var errEvalPlanner = pgerror.New(pgcode.ScalarOperationCannotRunWithoutFullSessionContext,
Expand Down
4 changes: 4 additions & 0 deletions pkg/sql/instrumentation.go
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@ type instrumentationHelper struct {

// regions used only on EXPLAIN ANALYZE to be displayed as top-level stat.
regions []string

// planGist is a compressed version of plan that can be converted (lossily)
// back into a logical plan or be used to get a plan hash.
planGist explain.PlanGist
}

// outputMode indicates how the statement output needs to be populated (for
Expand Down
1 change: 1 addition & 0 deletions pkg/sql/opt/exec/execbuilder/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ go_library(
"//pkg/sql/opt/cat",
"//pkg/sql/opt/constraint",
"//pkg/sql/opt/exec",
"//pkg/sql/opt/exec/explain",
"//pkg/sql/opt/memo",
"//pkg/sql/opt/norm",
"//pkg/sql/opt/ordering",
Expand Down
29 changes: 20 additions & 9 deletions pkg/sql/opt/exec/execbuilder/statement.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/sql/opt"
"github.com/cockroachdb/cockroach/pkg/sql/opt/cat"
"github.com/cockroachdb/cockroach/pkg/sql/opt/exec"
"github.com/cockroachdb/cockroach/pkg/sql/opt/exec/explain"
"github.com/cockroachdb/cockroach/pkg/sql/opt/memo"
"github.com/cockroachdb/cockroach/pkg/sql/opt/xform"
"github.com/cockroachdb/cockroach/pkg/sql/sem/tree"
Expand Down Expand Up @@ -119,27 +120,37 @@ func (b *Builder) buildExplainOpt(explain *memo.ExplainExpr) (execPlan, error) {
return planWithColumns(node, explain.ColList), nil
}

func (b *Builder) buildExplain(explain *memo.ExplainExpr) (execPlan, error) {
if explain.Options.Mode == tree.ExplainOpt {
return b.buildExplainOpt(explain)
func (b *Builder) buildExplain(explainExpr *memo.ExplainExpr) (execPlan, error) {
if explainExpr.Options.Mode == tree.ExplainOpt {
return b.buildExplainOpt(explainExpr)
}

node, err := b.factory.ConstructExplain(
&explain.Options,
explain.StmtType,
func(ef exec.ExplainFactory) (exec.Plan, error) {
&explainExpr.Options,
explainExpr.StmtType,
func(f exec.Factory) (exec.Plan, error) {
// Create a separate builder for the explain query.
gf := explain.NewPlanGistFactory(f)
ef := explain.NewFactory(gf)

explainBld := New(
ef, b.optimizer, b.mem, b.catalog, explain.Input, b.evalCtx, b.initialAllowAutoCommit,
ef, b.optimizer, b.mem, b.catalog, explainExpr.Input, b.evalCtx, b.initialAllowAutoCommit,
)
explainBld.disableTelemetry = true
return explainBld.Build()
plan, err := explainBld.Build()
if err != nil {
return nil, err
}
explainPlan := plan.(*explain.Plan)
explainPlan.Gist = gf.PlanGist()
return plan, nil
},
)
if err != nil {
return execPlan{}, err
}
return planWithColumns(node, explain.ColList), nil

return planWithColumns(node, explainExpr.ColList), nil
}

func (b *Builder) buildShowTrace(show *memo.ShowTraceForSessionExpr) (execPlan, error) {
Expand Down
31 changes: 31 additions & 0 deletions pkg/sql/opt/exec/execbuilder/testdata/explain_gist
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

statement ok
CREATE TABLE t (a INT PRIMARY KEY)

query T
EXPLAIN (GIST) SELECT * FROM t
----
AgFqAgABAAAABgI=

query T
SELECT * FROM crdb_internal.decode_plan_gist('AgFqAgABAAAABgI=')
----
• scan
table: t@primary
spans: FULL SCAN

query T
SELECT crdb_internal.decode_plan_gist('AgFqAgABAAAABgI=')
----
• scan
table: t@primary
spans: FULL SCAN

statement error pq: unknown signature: crdb_internal\.decode_plan_gist\(int\)
SELECT * FROM crdb_internal.decode_plan_gist(10)

statement error pq: unknown signature: crdb_internal\.decode_plan_gist\(int\)
SELECT crdb_internal.decode_plan_gist(10)

statement error pq: illegal base64 data at input byte 0
SELECT crdb_internal.decode_plan_gist('a')
1 change: 1 addition & 0 deletions pkg/sql/opt/exec/explain/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
explain_factory.og.go
plan_gist_factory.og.go
6 changes: 6 additions & 0 deletions pkg/sql/opt/exec/explain/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ go_library(
"explain_factory.go",
"flags.go",
"output.go",
"plan_gist_factory.go",
"result_columns.go",
":gen-explain-factory", # keep
],
Expand Down Expand Up @@ -42,6 +43,7 @@ go_test(
"explain_factory_test.go",
"main_test.go",
"output_test.go",
"plan_gist_test.go",
],
data = glob(["testdata/**"]),
embed = [":explain"],
Expand All @@ -50,7 +52,11 @@ go_test(
"//pkg/server",
"//pkg/sql/catalog/colinfo",
"//pkg/sql/execinfra",
"//pkg/sql/opt/cat",
"//pkg/sql/opt/exec",
"//pkg/sql/opt/memo",
"//pkg/sql/opt/testutils/opttester",
"//pkg/sql/opt/testutils/testcat",
"//pkg/sql/sem/tree",
"//pkg/sql/types",
"//pkg/testutils/serverutils",
Expand Down
Loading

0 comments on commit f7bb012

Please sign in to comment.