Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add ability to get a hash value from a logical plan #63885

Closed
Azhng opened this issue Apr 19, 2021 · 5 comments
Closed

sql: add ability to get a hash value from a logical plan #63885

Azhng opened this issue Apr 19, 2021 · 5 comments
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@Azhng
Copy link
Contributor

Azhng commented Apr 19, 2021

Currently, @cockroachdb/sql-observability are working towards adding the ability to compare historical query plans for a given statement.

We want to be able to quickly tell if the plan for a given statement has changed in a specified interval. Ideally, we should be able to do this without comparing the entire plan structure.

Since we already anonymized statements by removing constants in the statement, two plans with the same structure but different scan contraint values should produce the same hash.

Epic: CRDB-8631

@Azhng Azhng added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 19, 2021
@Azhng
Copy link
Contributor Author

Azhng commented Apr 21, 2021

cc: @RaduBerinde

@rytaft
Copy link
Collaborator

rytaft commented Apr 27, 2021

cc @kevin-v-ngo @awoods187 to decide if needed for 21.2

@kevin-v-ngo
Copy link

This would be helpful once we have persisted stats: #64743. If we surface a hash value, users can compare for a statement fingerprint how many times the plan had changed over time and what the impact was on the execution statistics.

@Azhng
Copy link
Contributor Author

Azhng commented Jul 7, 2021

I think one important thing about this is that the hash should only be derived from the execution plan and should be unaffected by other unrelated attributes in the logical plan.

For example, currently in the crdb_internal.node_statement_statistics we expose a sample_plan column:

root@127.0.0.1:26257/movr> select key, sample_plan from crdb_internal.node_statement_statistics where application_name = 'whack';
                                                key                                               |                                                                                                                      sample_plan
--------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  SELECT * FROM users                                                                             | {"Children": [], "Missing Stats": "", "Name": "scan", "Spans": "FULL SCAN", "Table": "users@primary"}
  SELECT * FROM users WHERE name = _                                                              | {"Children": [{"Children": [], "Estimated Row Count": "50 (100% of the table; stats collected 52 seconds ago)", "Name": "scan", "Spans": "FULL SCAN", "Table": "users@primary"}], "Estimated Row Count": "0", "Filter": "name = _", "Name": "filter"}
  SELECT key, sample_plan FROM crdb_internal.node_statement_statistics WHERE application_name = _ | {"Children": [{"Children": [], "Name": "virtual table", "Table": "node_statement_statistics@primary"}], "Filter": "application_name = _", "Name": "filter"}
  SHOW database                                                                                   | {"Children": [{"Children": [], "Name": "virtual table", "Table": "session_variables@primary"}], "Filter": "variable = _", "Name": "filter"}
(4 rows)

We can see that in the sample_plan we have attributes such as "Missing Stats" and "Estimated Row Count". If we are to trivially hash the JSON representation of the plan, we would have different hashes for the same plan as the result of variation in the table statistics, which is not desirable.

@kevin-v-ngo
Copy link

Agree @Azhng! We'd need a notion of a 'plan fingerprint' (ignoring insignificant plan attributes) on which the hash is based on.

cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 7, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF.   EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN

Fixes: cockroachdb#63885
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 12, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF.   EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN

Fixes: cockroachdb#63885
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 12, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF.   EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN

Fixes: cockroachdb#63885
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 12, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF.   EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN

Fixes: cockroachdb#63885
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 13, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 13, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 13, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 13, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 15, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 19, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 19, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 22, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 25, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
cucaroach added a commit to cucaroach/cockroach that referenced this issue Oct 26, 2021
Implement a plan "gist" serializer piggy backing on the exec gen/explain
factory infrastructure so that we can always know what the logical plan
was and can do historical and statistical tracking. Logically its like
an explain (SHAPE) but is even more stripped down. A gist is a sequence
of bytes representing the flattened tree of operators and various
operator specific metadata.

The goal is to record every logical plan we use for every query to have
historical data on which plans are used possibly linked up to statistics
so we know which stats go with which logical plan.

Also implement a decoder to turn the serialized plan back into a tree of
explain.Node's that can be displayed using existing explain code.

Currently this functionality is only exposed via a new EXPLAIN mode and
via a crdb_internal "decoder" SRF. EXPLAIN (GIST) takes a query and
returns a single string which is the encoded gist.
crdb_internal.decode_plan_gist() takes an encoded gist string and writes
out the logical plan one row per line.

For performance numbers of the ExecBuild comparing a StubFactory to a
PlanGistFactory wrapped around a StubFactory see the PR.

Fixes: cockroachdb#63885

Release note (sql change): Record compressed plan gist for all queries.
For example, a query like this:

SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

Produces the following plan according to EXPLAIN (SHAPE)

• distinct
│ distinct on: a
│
└── • union all
    │
    ├── • sort
    │   │ order: +b,+a
    │   │
    │   └── • scan
    │         missing stats
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order: +b,+a
        │
        └── • scan
              missing stats
              table: abc@primary
              spans: FULL SCAN

produces the following "gist":

AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

The "gist" can be turned back into the following plan:

• distinct
│ distinct on
│
└── • union all
    │
    ├── • sort
    │   │ order
    │   │
    │   └── • scan
    │         table: abc@primary
    │         spans: FULL SCAN
    │
    └── • sort
        │ order
        │
        └── • scan
              table: abc@primary
              spans: FULL SCAN
craig bot pushed a commit that referenced this issue Oct 27, 2021
69293: sql: implement a fast compressed logical plan mechanism r=rtaft,RaduBerinde a=cucaroach


    Implement a plan "gist" serializer piggy backing on the exec gen/explain
    factory infrastructure so that we can always know what the logical plan
    was and can do historical and statistical tracking. Logically its like
    an explain (SHAPE) but is even more stripped down. A gist is a sequence
    of bytes representing the flattened tree of operators and various
    operator specific metadata.

    The goal is to record every logical plan we use for every query to have
    historical data on which plans are used possibly linked up to statistics
    so we know which stats go with which logical plan.

    Also implement a decoder to turn the serialized plan back into a tree of
    explain.Node's that can be displayed using existing explain code.

    Currently this functionality is only exposed via a new EXPLAIN mode and
    via a crdb_internal "decoder" SRF.   EXPLAIN (GIST) takes a query and
    returns a single string which is the encoded gist.
    crdb_internal.decode_plan_gist() takes an encoded gist string and writes
    out the logical plan one row per line.

    For performance numbers of the ExecBuild comparing a StubFactory to a
    PlanGistFactory wrapped around a StubFactory see the PR.

    Release note (sql change): Record compressed plan gist for all queries.
    For example, a query like this:

    SELECT * FROM abc UNION SELECT * FROM abc ORDER BY b,a

    Produces the following plan according to EXPLAIN (SHAPE)

    • distinct
    │ distinct on: a
    │
    └── • union all
        │
        ├── • sort
        │   │ order: +b,+a
        │   │
        │   └── • scan
        │         missing stats
        │         table: abc@primary
        │         spans: FULL SCAN
        │
        └── • sort
            │ order: +b,+a
            │
            └── • scan
                  missing stats
                  table: abc@primary
                  spans: FULL SCAN

    produces the following "gist":

    AgFuAgAHAAAAEQFuAgAHAAAAERANAAYGAA==

    The "gist" can be turned back into the following plan:

    • distinct
    │ distinct on
    │
    └── • union all
        │
        ├── • sort
        │   │ order
        │   │
        │   └── • scan
        │         table: abc@primary
        │         spans: FULL SCAN
        │
        └── • sort
            │ order
            │
            └── • scan
                  table: abc@primary
                  spans: FULL SCAN

    Fixes: #63885


Co-authored-by: Tommy Reilly <treilly@cockroachlabs.com>
@craig craig bot closed this as completed in 8bf5d1c Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

No branches or pull requests

5 participants