execbuilder: plan directly to distsql specs #47473

asubiotto · 2020-04-14T14:40:21Z

Remaining big items/questions

how do we handle the closure of planNodes that are wrapped into the physical plans?
sql: add support for aggregations and values in the new factory #50560 (review) goes into some details with the difficulties, but the TL;DR is that some planNodes do not have equivalent processor specs, so we need to have a way to "embed" them into the physical plans (this is done via wrapPlan method which probably doesn't need any modifications) and clean up after them once the query is done (this is the part needs figuring out). With the old factory, that cleanup is done by planTop.close method which closes all trees in the query. However, with the new factory we don't have full trees - we might have a collection of random planNodes that might not be connected between each other.
how do we handle subqueries? sql: figure out how to handle subqueries by distSQLSpecExecFactory #51095
how do we connect appropriate planNodes in the physical plan? sql: figure out how to handle subqueries by distSQLSpecExecFactory #51095
what is our testing story?
We probably will rely mostly on existing logic tests but also we need to add a differential testing harness before we're comfortable with removing the old factory. This is tracked in sql: implement differential testing of physical plans produced by the old and new factories #50610.

Here is the list of all unimplemented methods that was added in #49348 that introduced the new factory (the ones that are checked either have been already merged or have a working work-in-progress commits). All methods are divided into two categories - one is methods in which we can construct the spec directly because there is a corresponding processor core, and another are planNodes that don't have an equivalent processor and might need to be wrapped (it is possible that the second category might be actually reduced to just a handful of items that require figuring out the necessary plumbing for handling the wrapped planNodes and all of the methods below will "just work" 🤞 ).

Can construct processor specs directly:

Need to have wrapped planNodes:

Miscellaneous items:

RenameColumns
ConstructPlan
populate index usage statistics in the DistSQL spec factory

Background info

Currently, the last stage in the optimizer is to use the execbuilder.Builder to construct a planNode tree from a memo expression tree. This planNode tree is subsequently converted to a collection of ProcessorSpecs in the distsql physical planner:

cockroach/pkg/sql/distsql_physical_planner.go

Line 2293 in 1e9f570

func (dsp *DistSQLPlanner) createPlanForNode(

We need to get rid of this redundant conversion and create ProcessorSpecs directly.

For 20.2, our focus should be on adding an off-by-default option to remove this redundant planning phase for the kv --read-percent=100 workload. I envision this as creating a new implementation of the exec.Factory implementation that is swapped in when a cluster setting is set.

Supporting kv --read-percent=100 is mostly a question of implementing ConstructScan and nothing else. However, this will probably teach us a lot of what we need to do to move over and how to do it. My hope is that once we get the TableReaderSpecs created directly, we'll have a concrete gameplan on moving over the rest of the processor spec creation.

Epic: CRDB-79

Jira issue: CRDB-4407

The text was updated successfully, but these errors were encountered:

RaduBerinde · 2020-05-02T05:08:19Z

Nice writeup! Note that @rytaft worked on a prototype for this a while back, it is here: #30969

asubiotto · 2020-06-03T09:55:51Z

@yuzefovich as we discussed during the meeting, it would be good if as part of this milestone you could also think about how to make it easier for other engineers/external contributors to add spec creation so that we can maybe distribute and parallelize this work. Maybe this needs to wait until the creation of a couple of the more difficult specs is done.

yuzefovich · 2020-06-23T04:20:45Z

I reran kv100 workload on a 3 node roachprod cluster (with the dataset consisting of writes that occurred in 1 hour of kv0 workload) with the following options --concurrency=192 --ramp=5m --duration=30m.

experimental_distsql_planning=off:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0       38695504        21497.5      8.9      7.3     24.1     37.7    159.4  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
 1800.0s        0       38695504        21497.5      8.9      7.3     24.1     37.7    159.4

experimental_distsql_planning=on:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0       39137348        21743.1      8.8      7.3     24.1     37.7    167.8  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
 1800.0s        0       39137348        21743.1      8.8      7.3     24.1     37.7    167.8

This comes out to 1.14% improvement.

50252: opt: change GeoLookupJoin to InvertedLookupJoin r=rytaft a=rytaft This commit converts the `GeoLookupJoin` operator in the optimizer into a more general operator, `InvertedLookupJoin`. This new operator maps directly to the `invertedJoiner` DistSQL processor. In the future, the `InvertedLookupJoin` operator can also be used for lookup joins on inverted JSON and array indexes in addition to geospatial indexes. This commit also adds a new structure, `geoDatumToInvertedExpr`, which implements the `DatumToInvertedExpr` interface for geospatial data types. This will enable the optimized plan to be easily converted to a DistSQL plan for execution by the `invertedJoiner`. Release note: None 50450: sql: add support for hash and merge joins in the new factory r=yuzefovich a=yuzefovich **sql: minor cleanup of joiner planning** `joinNode.mergeJoinOrdering` is now set to non-zero length by the optimizer only when we can use a merge join (meaning that number of equality columns is non-zero and equals the length of the ordering we have). This allows us to slightly simplify the setup up of the merge joiners. Additionally, this commit switching to using `[]exec.NodeColumnOrdinal` instead of `int` for equality columns in `joinPredicate` which allows us to remove one conversion step when planning hash joiners. Also we introduce a small helper that will be reused by the follow-up work. Release note: None **sql: add support for hash and merge joins in the new factory** This commit adds implementation of `ConstructHashJoin` and `ConstructMergeJoin` in the new factory by mostly refactoring and reusing already existing code in the physical planner. Notably, interleaved joins are not supported yet. Fixes: #50291. Addresses: #47473. Release note: None 50646: build: update instructions for updating dependencies r=jbowens a=otan Release note: None Co-authored-by: Rebecca Taft <becca@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com>

50457: geo: coerce invalid geography coordinates into geometry r=otan a=Arun4rangan This commit resolves: #50219 Previously geometry did not convert out of range lat-lngs into correct range. This commit mimics the behavior found in PostGIS. It converts out of range lat-lngs into correct range. Release note (sql change): Geometry now coerce invalid geography coordinates into correct geometry. 50560: sql: add support for aggregations and values in the new factory r=yuzefovich a=yuzefovich **sql: remove some unnecessary things around aggregations** Release note: None **sql: add support for aggregations in the new factory** This commit implements `ConstructGroupBy` and `ConstructScalarGroupBy` methods in the new factory by populating the specs upfront and reusing DistSQLPlanner to do the actual planning. Fixes: #50290. Release note: None **sql: minor cleanup of planNodeToRowSource** This commit cleans up `planNodeToRowSource` to be properly closed (similar to other processors). Release note: None **sql: implement ConstructValues in the new factory** This commit adds implementation of `ConstructValues` in the new factory. In some cases, a valuesNode is the only way to "construct values" - when the node must be wrapped (see createPhysPlanForValuesNode for more details). In other cases, a valuesNode is used to construct the values processor spec. The latter usage is refactored to avoid valuesNode creation. This decision also prompts us to add a separate "side" list of `planNode`s that are part of the physical plan and need to be closed when the whole plan is closed. This is done by introducing a utility wrapper around `PhysicalPlan`. This approach was chosen after considering an alternative in which `planNodeToRowSource` adapter would be the one closing the `planNode` it is wrapping because `planNode.Close` contract currently prohibits the execution from closing any of the `planNode`s, and changined that would be quite invasive at this point. We might reevaluate this decision later, once we've made more progress on the distsql spec work. Addresses: #47473. Release note: None **logictest: add spec-planning configs to the default ones** We have now implemented noticeable chunk of methods in the new factory, so I think it makes to run all logic tests with the spec-planning configs by default. The only logic test that is currently skipped is `interleaved_join` because the new factory doesn't plan interleaved joins yet. Release note: None **sql: enhance unsupported error message in the new factory** Release note: None **sql: support simple projection on top of planNode in the new factory** The new factory still constructs some of the `planNode`s (for example, explain variants), and we should be able to handle simple projections on top of those. This is handled by reusing the logic from the old factory. The issue was hidden by the fallback to the old factory since EXPLAIN statements don't have "SELECT" statement tag. Release note: None **sql: implement ConstructLimit in the new factory** This commit additionally removes the requirement that `limitExpr` and `offsetExpr` must be distributable in order for the whole plan to be distributable in the old path because those expressions are evaluated during the physical planning, locally. Release note: None Co-authored-by: Arun Ranganathan <arun.ranga@hotmail.ca> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

50909: sql: add sort and virtual scan support in the new factory r=yuzefovich a=yuzefovich **sql: implement ConstructSort in the new factory** Release note: None **sql: add support of virtual scans in the new factory** The support is added by sharing the same code with the old factory that construct `delayedNode` with the rest of necessary methods already being implemented in the new factory. That delayedNode is wrapped into the physical plan via a callback. Note that with the new factory it is possible to have a distributed plan that contains an execinfra.LocalProcessor. This required to make a change that distsql.LocalState, regardless of the plan distribution, has always LocalProcs set when setting up a flow on the gateway. Addresses: #47473. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

51305: geo,geoindex: use bounding box for geography coverings that are bad r=sumeerbhola a=sumeerbhola This change uses a covererInterface implementation for geography that notices when a covering is using top-level cells of all faces and in that case uses the bounding box to compute the covering. Also changed the bounding box calculation for geography shapes to use only the points and not first construct s2.Regions. The latter causes marginally bad shapes to continue to have bad coverings since the bounding box also covers the whole earth. Release note: None 51882: roachpb: panic when comparing a Lease to a non-lease r=andreimatei a=andreimatei Release note: None 52146: sql: remove local execution of projectSetNode and implement ConstructProjectSet in the new factory r=yuzefovich a=yuzefovich Depends on #52108. **sql: remove local execution of projectSetNode** We have project set processor which is always planned for `projectSetNode`, so this commit removes the dead code of its local execution. Additionally, it removes some unused fields and cleans up cancellation check of the processor. Release note: None **sql: implement ConstructProjectSet in the new factory** Addresses: #47473. Release note: None 52320: kvserver: enable merges in kvnemesis r=aayushshah15 a=aayushshah15 We had merges disabled because of the bugs tracked in #44878, but those have since been fixed by #46085 and #50265. Release note: None Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>

48842: sql: fix portals after exhausting rows r=yuzefovich a=yuzefovich Previously, we would erroneously restart the execution from the very beginning of empty, unclosed portals after they have been fully exhausted when we should be returning no rows or an error in such scenarios. This is now fixed by tracking whether a portal is exhausted or not and intercepting the calls to `execStmt` when the conn executor state machine is in an open state. Note that the current solution has known deviations from Postgres: - when attempting to execute portals of statements that don't return row sets, on the second and consequent attempt PG returns an error while we are silently doing nothing (meaning we do not run the statement at all and return 0 rows) - we incorrectly populate "command tag" field of pgwire messages of some rows-returning statements after the portal suspension (for example, a suspended UPDATE RETURNING in PG will return the total count of "rows affected" while we will return the count since the last suspension). These deviations are deemed acceptable since this commit fixes a much worse problem - re-executing an exhausted portal (which could be a mutation meaning, previously, we could have executed a mutation multiple times). The reasons for why this commit does not address these deviations are: - Postgres has a concept of "portal strategy" (see https://github.com/postgres/postgres/blob/2f9661311b83dc481fc19f6e3bda015392010a40/src/include/utils/portal.h#L89). - Postgres has a concept of "command" type (these are things like SELECTs, UPDATEs, INSERTs, etc, see https://github.com/postgres/postgres/blob/1aac32df89eb19949050f6f27c268122833ad036/src/include/nodes/nodes.h#L672). CRDB doesn't have these concepts, and without them any attempt to simulate Postgres results in a very error-prone and brittle code. Fixes: #48448. Release note (bug fix): Previously, CockroachDB would erroneously restart the execution of empty, unclosed portals after they have been fully exhausted, and this has been fixed. 52098: sql: better distribute distinct processors r=yuzefovich a=yuzefovich **sql: better distribute distinct processors** The distinct processors are planned in two stages - first, a "local" stage is planned on the same nodes as the previous stage, then, a "global" stage is added. Previously, the global stage would be planned on the gateway, and this commit changes that to make it distributed - by planning "global" distinct processors on the same nodes as the "local" ones and connecting them via a hash router hashing the distinct columns. Release note: None **sql: implement ConstructDistinct in the new factory** Addresses: #47473. Release note: None 52358: engine: set the MVCC timestamp on reads due to historical intents r=ajwerner a=ajwerner Before this commit, we'd return a zero-value MVCC timestamp when reading an intent from the intent history. This was problematic because elsewhere in the code we assume that we will always get a non-zero MVCC timestamp when a read returns a value. This is especially bizarre given that a read of the latest intent will return its write timestamp. The semantics here are such that we'll return the timestamp of the MVCC metadata for the row. I think this is the most reasonable thing to do as that timestamp ought to reflect the timestamp we return when we return the current intent and furthermore is the only timestamp we really have around. We could return the transactions current read or write timestamp but those both seem like worse choices. It's also worth noting that in the case where we need to read a non-zero value, we don't really care what that value is and the fact that we are reading this intent itself is somewhat suspect. That being said, I still think we should make this change in addition to any change made to prevent the code referenced in #49266 from needing it. Fixes #49266. Informs #50102. Release note: None 52384: sql: properly reset extraTxnState in COPY r=ajwerner a=ajwerner Apparently we support some sort of COPY protocol that I know nothing about. We allow operations in that protocol in the open state and in the noTxn state in the connExecutor. In the noTxn state we let the `copyMachine` handle its transaction lifecycle all on its own. In that case, we also hand have the `connExecutor` in a fresh state when calling `execCopyIn()`. During the execution of `COPY`, it seems like sometime we can pick up table descriptor leases. In the noTxn case we'd like to drop those leases and generally leave the executor in the fresh state in which it was handed to us. To deal with that, we call `resetExtraTxnState` before returning in the happy noTxn case. Fixes #52065. Release note (bug fix): Fixed a bug when using the COPY protocol which could prevent schema changes for up to 5 minutes. Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>

52163: sql: implement ConstructOpaque in the new factory r=yuzefovich a=yuzefovich Addresses: #47473. Release note: None 52408: sql/catalog/lease: attempt to fix a flakey test r=lucy-zhang a=ajwerner I think this test was going to fail. Fixes #52385. Release note: None 52486: sqlbase,catalogkv: move more kv interactions to catalogkv r=ajwerner a=ajwerner At this point the only remaining kv interactions in sqlbase are during type and table interactions though there are also usages through the protoGetter. This is a minor change in the work to pick apart sqlbase. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>

Informs cockroachdb#47473 Release note: None

72653: sql: implement ConstructZigzagJoin in distsql_spec_exec_factory r=cucaroach a=cucaroach Informs #47473 Release note: None 72825: sql: handle no cluster_inflight_traces table on tenants r=adityamaru a=aliher1911 Previously query for traces under tenant will fail with NPE. This was happening because trace collector is not available and virtual table was not aware of the case. This patch adds handling of the table on tenant sql pods. Release note: None Fixes #72564 72871: backupccl: don't include tenants in non-cluster backups r=dt a=dt Release note (bug fix): System tenant backups of individual tables and databases no longer include tenants as well. 72887: roachprod: remove ClusterImpl interface r=RaduBerinde a=RaduBerinde The ClusterImpl interface was useful when roachprod had some (very limited) support for Cassandra, but that has been removed. We're left with an unnecessary indirection that lacks documentation. The code doesn't follow any layering, with `Cockroach` working directly with `SyncedCluster` internals and plenty of cockroach-specific code in `SyncedCluster`. This commit removes this interface and improves the documentation of the methods that are simplified. Release note: None 72894: kv: use version-less key in rditer.KeyRange r=nvanbenschoten a=nvanbenschoten This commit replaces the use of `storage.MVCCKey` bounds with `roachpb.Key` bounds in `rditer.KeyRange`. It addresses an existing TODO and helps clarify things in #72121 slightly, as that PR is going to add a new field to `MVCCKey`. 72895: storage: remove decodeMVCCKey r=nvanbenschoten a=nvanbenschoten The function was identical to `DecodeMVCCKey`, which is defined a few lines up in the file. 72902: storage: delete deprecated PutProto function r=nvanbenschoten a=nvanbenschoten This function has been deprecated since 5e5eaa5. It was only used in one test, so this commit removes it. 72907: build/builder: install awscli v2 r=rail a=erikgrinaker `roachprod` requires awscli v2, but the `builder` image included v1. Release note: None Co-authored-by: Tommy Reilly <treilly@cockroachlabs.com> Co-authored-by: Oleg Afanasyev <oleg@cockroachlabs.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>

asubiotto added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-optimizer SQL logical planning and optimizations. meta-issue Contains a list of several other issues. A-sql-execution Relating to SQL execution. labels Apr 14, 2020

asubiotto assigned yuzefovich Apr 14, 2020

asubiotto mentioned this issue Apr 14, 2020

execbuilder: plan TableReaderSpec directly #47474

Closed

asubiotto mentioned this issue Jun 2, 2020

rowexec: investigate and improve lookup join performance #47472

Closed

5 tasks

This was referenced Jun 16, 2020

sql: implement ConstructGroupBy in the new factory #50290

Closed

sql: implement ConstructHashJoin in the new factory #50291

Closed

sql: add support for hash and merge joins in the new factory #50450

Merged

This was referenced Jun 23, 2020

sql: add support for aggregations and values in the new factory #50560

Merged

sql: implement differential testing of physical plans produced by the old and new factories #50610

Open

yuzefovich mentioned this issue Jul 9, 2020

sql: add sort and virtual scan support in the new factory #50909

Merged

This was referenced Jul 30, 2020

sql: better distribute distinct processors #52098

Merged

sql: remove local execution of projectSetNode and implement ConstructProjectSet in the new factory #52146

Merged

yuzefovich mentioned this issue Aug 4, 2020

sql: implement ConstructOpaque in the new factory #52163

Merged

yuzefovich mentioned this issue Mar 18, 2021

colflow: fix premature vectorized stats collection #62179

Closed

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

jordanlewis mentioned this issue Jul 12, 2021

opt: add node and region information to explain #59860

Closed

cucaroach self-assigned this Oct 7, 2021

cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 12, 2021

sql: implement ConstructZigzagJoin in distsql_spec_exec_factory

ba65955

Informs cockroachdb#47473 Release note: None

cucaroach mentioned this issue Nov 12, 2021

sql: implement ConstructZigzagJoin in distsql_spec_exec_factory #72653

Merged

cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 12, 2021

sql: implement ConstructZigzagJoin in distsql_spec_exec_factory

2b72a16

Informs cockroachdb#47473 Release note: None

cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 12, 2021

sql: implement ConstructZigzagJoin in distsql_spec_exec_factory

721c381

Informs cockroachdb#47473 Release note: None

cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 15, 2021

sql: implement ConstructZigzagJoin in distsql_spec_exec_factory

9319e2f

Informs cockroachdb#47473 Release note: None

cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 18, 2021

sql: implement ConstructZigzagJoin in distsql_spec_exec_factory

a4af072

Informs cockroachdb#47473 Release note: None

cucaroach mentioned this issue Jan 6, 2022

sql: distsql direct planning of lookup joins #74543

Closed

exalate-issue-sync bot unassigned cucaroach Feb 1, 2022

yuzefovich removed their assignment May 31, 2022

yuzefovich mentioned this issue Sep 18, 2023

sql: populate index usage statistics in the DistSQL spec factory #78429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execbuilder: plan directly to distsql specs #47473

execbuilder: plan directly to distsql specs #47473

asubiotto commented Apr 14, 2020 •

edited by yuzefovich

Loading

RaduBerinde commented May 2, 2020

asubiotto commented Jun 3, 2020

yuzefovich commented Jun 23, 2020

execbuilder: plan directly to distsql specs #47473

execbuilder: plan directly to distsql specs #47473

Comments

asubiotto commented Apr 14, 2020 • edited by yuzefovich Loading

RaduBerinde commented May 2, 2020

asubiotto commented Jun 3, 2020

yuzefovich commented Jun 23, 2020

asubiotto commented Apr 14, 2020 •

edited by yuzefovich

Loading