Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execbuilder: plan directly to distsql specs #47473

Open
21 of 59 tasks
asubiotto opened this issue Apr 14, 2020 · 3 comments
Open
21 of 59 tasks

execbuilder: plan directly to distsql specs #47473

asubiotto opened this issue Apr 14, 2020 · 3 comments
Labels
A-sql-execution Relating to SQL execution. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) meta-issue Contains a list of several other issues. T-sql-queries SQL Queries Team

Comments

@asubiotto
Copy link
Contributor

asubiotto commented Apr 14, 2020

Remaining big items/questions


Here is the list of all unimplemented methods that was added in #49348 that introduced the new factory (the ones that are checked either have been already merged or have a working work-in-progress commits). All methods are divided into two categories - one is methods in which we can construct the spec directly because there is a corresponding processor core, and another are planNodes that don't have an equivalent processor and might need to be wrapped (it is possible that the second category might be actually reduced to just a handful of items that require figuring out the necessary plumbing for handling the wrapped planNodes and all of the methods below will "just work" 🤞 ).

Can construct processor specs directly:

  • ConstructValues (this is checked but depends on the answer to the first question above)
  • ConstructScan (non virtual scan)
  • ConstructFilter
  • ConstructInvertedFilter
  • ConstructSimpleProject
  • ConstructRender
  • ConstructHashJoin
  • ConstructMergeJoin
  • ConstructGroupBy
  • ConstructScalarGroupBy
  • ConstructDistinct
  • ConstructSetOp
  • ConstructSort
  • ConstructOrdinality
  • ConstructIndexJoin
  • ConstructLookupJoin (sql: distsql direct planning of lookup joins #74543)
  • ConstructInvertedJoin
  • ConstructZigzagJoin
  • ConstructLimit
  • ConstructProjectSet
  • ConstructWindow

Need to have wrapped planNodes:

  • ConstructValues
  • ConstructScan (virtual scan)
  • ConstructApplyJoin
  • ConstructMax1Row
  • ConstructExplainOpt
  • ConstructExplain
  • ConstructExplain (plan)
  • ConstructShowTrace
  • ConstructInsert
  • ConstructInsertFastPath
  • ConstructUpdate
  • ConstructUpsert
  • ConstructDelete
  • ConstructDeleteRange
  • ConstructCreateTable
  • ConstructCreateView
  • ConstructSequenceSelect
  • ConstructSaveTable
  • ConstructErrorIfRows
  • ConstructOpaque
  • ConstructAlterTableSplit
  • ConstructAlterTableUnsplit
  • ConstructAlterTableUnsplitAll
  • ConstructAlterTableRelocate
  • ConstructBuffer
  • ConstructScanBuffer
  • ConstructRecursiveCTE
  • ConstructControlJobs
  • ConstructCancelQueries
  • ConstructCancelSessions
  • ConstructExport

Miscellaneous items:

  • RenameColumns
  • ConstructPlan
  • populate index usage statistics in the DistSQL spec factory

Background info

Currently, the last stage in the optimizer is to use the execbuilder.Builder to construct a planNode tree from a memo expression tree. This planNode tree is subsequently converted to a collection of ProcessorSpecs in the distsql physical planner:

func (dsp *DistSQLPlanner) createPlanForNode(

We need to get rid of this redundant conversion and create ProcessorSpecs directly.

For 20.2, our focus should be on adding an off-by-default option to remove this redundant planning phase for the kv --read-percent=100 workload. I envision this as creating a new implementation of the exec.Factory implementation that is swapped in when a cluster setting is set.

Supporting kv --read-percent=100 is mostly a question of implementing ConstructScan and nothing else. However, this will probably teach us a lot of what we need to do to move over and how to do it. My hope is that once we get the TableReaderSpecs created directly, we'll have a concrete gameplan on moving over the rest of the processor spec creation.

Epic: CRDB-79

Jira issue: CRDB-4407

@asubiotto asubiotto added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-optimizer SQL logical planning and optimizations. meta-issue Contains a list of several other issues. A-sql-execution Relating to SQL execution. labels Apr 14, 2020
@RaduBerinde
Copy link
Member

Nice writeup! Note that @rytaft worked on a prototype for this a while back, it is here: #30969

@asubiotto
Copy link
Contributor Author

@yuzefovich as we discussed during the meeting, it would be good if as part of this milestone you could also think about how to make it easier for other engineers/external contributors to add spec creation so that we can maybe distribute and parallelize this work. Maybe this needs to wait until the creation of a couple of the more difficult specs is done.

@yuzefovich
Copy link
Member

I reran kv100 workload on a 3 node roachprod cluster (with the dataset consisting of writes that occurred in 1 hour of kv0 workload) with the following options --concurrency=192 --ramp=5m --duration=30m.

  • experimental_distsql_planning=off:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0       38695504        21497.5      8.9      7.3     24.1     37.7    159.4  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
 1800.0s        0       38695504        21497.5      8.9      7.3     24.1     37.7    159.4  
  • experimental_distsql_planning=on:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0       39137348        21743.1      8.8      7.3     24.1     37.7    167.8  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
 1800.0s        0       39137348        21743.1      8.8      7.3     24.1     37.7    167.8  

This comes out to 1.14% improvement.

craig bot pushed a commit that referenced this issue Jun 25, 2020
50252: opt: change GeoLookupJoin to InvertedLookupJoin r=rytaft a=rytaft

This commit converts the `GeoLookupJoin` operator in the optimizer into
a more general operator, `InvertedLookupJoin`. This new operator maps
directly to the `invertedJoiner` DistSQL processor. In the future, the
`InvertedLookupJoin` operator can also be used for lookup joins on inverted
JSON and array indexes in addition to geospatial indexes.

This commit also adds a new structure, `geoDatumToInvertedExpr`, which
implements the `DatumToInvertedExpr` interface for geospatial data types.
This will enable the optimized plan to be easily converted to a DistSQL
plan for execution by the `invertedJoiner`.

Release note: None


50450: sql: add support for hash and merge joins in the new factory r=yuzefovich a=yuzefovich

**sql: minor cleanup of joiner planning**

`joinNode.mergeJoinOrdering` is now set to non-zero length by the
optimizer only when we can use a merge join (meaning that number of
equality columns is non-zero and equals the length of the ordering we
have). This allows us to slightly simplify the setup up of the merge
joiners.

Additionally, this commit switching to using `[]exec.NodeColumnOrdinal`
instead of `int` for equality columns in `joinPredicate` which allows us
to remove one conversion step when planning hash joiners.

Also we introduce a small helper that will be reused by the follow-up
work.

Release note: None

**sql: add support for hash and merge joins in the new factory**

This commit adds implementation of `ConstructHashJoin` and
`ConstructMergeJoin` in the new factory by mostly refactoring and
reusing already existing code in the physical planner. Notably,
interleaved joins are not supported yet.

Fixes: #50291.
Addresses: #47473.

Release note: None

50646: build: update instructions for updating dependencies r=jbowens a=otan

Release note: None

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Jul 1, 2020
50457: geo: coerce invalid geography coordinates into geometry r=otan a=Arun4rangan

This commit resolves: #50219

Previously geometry did not convert out of range lat-lngs into
correct range. This commit mimics the behavior found in PostGIS.
It converts out of range lat-lngs into correct range.

Release note (sql change): Geometry now coerce invalid
geography coordinates into correct geometry.

50560: sql: add support for aggregations and values in the new factory r=yuzefovich a=yuzefovich

**sql: remove some unnecessary things around aggregations**

Release note: None

**sql: add support for aggregations in the new factory**

This commit implements `ConstructGroupBy` and `ConstructScalarGroupBy`
methods in the new factory by populating the specs upfront and reusing
DistSQLPlanner to do the actual planning.

Fixes: #50290.

Release note: None

**sql: minor cleanup of planNodeToRowSource**

This commit cleans up `planNodeToRowSource` to be properly closed
(similar to other processors).

Release note: None

**sql: implement ConstructValues in the new factory**

This commit adds implementation of `ConstructValues` in the new factory.
In some cases, a valuesNode is the only way to "construct values" - when
the node must be wrapped (see createPhysPlanForValuesNode for more
details). In other cases, a valuesNode is used to construct the values
processor spec. The latter usage is refactored to avoid valuesNode
creation.

This decision also prompts us to add a separate "side" list of
`planNode`s that are part of the physical plan and need to be closed
when the whole plan is closed. This is done by introducing a utility
wrapper around `PhysicalPlan`. This approach was chosen after
considering an alternative in which `planNodeToRowSource` adapter would
be the one closing the `planNode` it is wrapping because
`planNode.Close` contract currently prohibits the execution from closing
any of the `planNode`s, and changined that would be quite invasive at
this point. We might reevaluate this decision later, once we've made
more progress on the distsql spec work.

Addresses: #47473.

Release note: None

**logictest: add spec-planning configs to the default ones**

We have now implemented noticeable chunk of methods in the new factory,
so I think it makes to run all logic tests with the spec-planning
configs by default.

The only logic test that is currently skipped is `interleaved_join`
because the new factory doesn't plan interleaved joins yet.

Release note: None

**sql: enhance unsupported error message in the new factory**

Release note: None

**sql: support simple projection on top of planNode in the new factory**

The new factory still constructs some of the `planNode`s (for example,
explain variants), and we should be able to handle simple projections on
top of those. This is handled by reusing the logic from the old factory.
The issue was hidden by the fallback to the old factory since EXPLAIN
statements don't have "SELECT" statement tag.

Release note: None

**sql: implement ConstructLimit in the new factory**

This commit additionally removes the requirement that `limitExpr` and
`offsetExpr` must be distributable in order for the whole plan to be
distributable in the old path because those expressions are evaluated
during the physical planning, locally.

Release note: None

Co-authored-by: Arun Ranganathan <arun.ranga@hotmail.ca>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Jul 9, 2020
50909: sql: add sort and virtual scan support in the new factory r=yuzefovich a=yuzefovich

**sql: implement ConstructSort in the new factory**

Release note: None

**sql: add support of virtual scans in the new factory**

The support is added by sharing the same code with the old factory that
construct `delayedNode` with the rest of necessary methods already being
implemented in the new factory. That delayedNode is wrapped into the
physical plan via a callback.

Note that with the new factory it is possible to have a distributed plan
that contains an execinfra.LocalProcessor. This required to make
a change that distsql.LocalState, regardless of the plan distribution,
has always LocalProcs set when setting up a flow on the gateway.

Addresses: #47473.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Aug 4, 2020
51305: geo,geoindex: use bounding box for geography coverings that are bad r=sumeerbhola a=sumeerbhola

This change uses a covererInterface implementation for geography
that notices when a covering is using top-level cells of all faces
and in that case uses the bounding box to compute the covering.

Also changed the bounding box calculation for geography shapes to
use only the points and not first construct s2.Regions. The latter
causes marginally bad shapes to continue to have bad coverings since
the bounding box also covers the whole earth.

Release note: None

51882: roachpb: panic when comparing a Lease to a non-lease r=andreimatei a=andreimatei

Release note: None

52146: sql: remove local execution of projectSetNode and implement ConstructProjectSet in the new factory r=yuzefovich a=yuzefovich

Depends on #52108.

**sql: remove local execution of projectSetNode**

We have project set processor which is always planned for
`projectSetNode`, so this commit removes the dead code of its local
execution. Additionally, it removes some unused fields and cleans up
cancellation check of the processor.

Release note: None

**sql: implement ConstructProjectSet in the new factory**

Addresses: #47473.

Release note: None

52320: kvserver: enable merges in kvnemesis r=aayushshah15 a=aayushshah15

We had merges disabled because of the bugs tracked in #44878, but those have
since been fixed by #46085 and #50265.

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
craig bot pushed a commit that referenced this issue Aug 5, 2020
48842: sql: fix portals after exhausting rows r=yuzefovich a=yuzefovich

Previously, we would erroneously restart the execution from the very
beginning of empty, unclosed portals after they have been fully
exhausted when we should be returning no rows or an error in such
scenarios. This is now fixed by tracking whether a portal is exhausted
or not and intercepting the calls to `execStmt` when the conn executor
state machine is in an open state.

Note that the current solution has known deviations from Postgres:
- when attempting to execute portals of statements that don't return row
sets, on the second and consequent attempt PG returns an error while we
are silently doing nothing (meaning we do not run the statement at all
and return 0 rows)
- we incorrectly populate "command tag" field of pgwire messages of some
rows-returning statements after the portal suspension (for example,
a suspended UPDATE RETURNING in PG will return the total count of "rows
affected" while we will return the count since the last suspension).

These deviations are deemed acceptable since this commit fixes a much
worse problem - re-executing an exhausted portal (which could be
a mutation meaning, previously, we could have executed a mutation
multiple times).

The reasons for why this commit does not address these deviations are:
- Postgres has a concept of "portal strategy"
(see https://github.com/postgres/postgres/blob/2f9661311b83dc481fc19f6e3bda015392010a40/src/include/utils/portal.h#L89).
- Postgres has a concept of "command" type (these are things like
SELECTs, UPDATEs, INSERTs, etc,
see https://github.com/postgres/postgres/blob/1aac32df89eb19949050f6f27c268122833ad036/src/include/nodes/nodes.h#L672).

CRDB doesn't have these concepts, and without them any attempt to
simulate Postgres results in a very error-prone and brittle code.

Fixes: #48448.

Release note (bug fix): Previously, CockroachDB would erroneously
restart the execution of empty, unclosed portals after they have been
fully exhausted, and this has been fixed.

52098: sql: better distribute distinct processors r=yuzefovich a=yuzefovich

**sql: better distribute distinct processors**

The distinct processors are planned in two stages - first, a "local"
stage is planned on the same nodes as the previous stage, then,
a "global" stage is added. Previously, the global stage would be planned
on the gateway, and this commit changes that to make it distributed - by
planning "global" distinct processors on the same nodes as the "local"
ones and connecting them via a hash router hashing the distinct columns.

Release note: None

**sql: implement ConstructDistinct in the new factory**

Addresses: #47473.

Release note: None

52358: engine: set the MVCC timestamp on reads due to historical intents r=ajwerner a=ajwerner

Before this commit, we'd return a zero-value MVCC timestamp when reading an
intent from the intent history. This was problematic because elsewhere in the
code we assume that we will always get a non-zero MVCC timestamp when a read
returns a value. This is especially bizarre given that a read of the latest
intent will return its write timestamp.

The semantics here are such that we'll return the timestamp of the MVCC
metadata for the row. I think this is the most reasonable thing to do as
that timestamp ought to reflect the timestamp we return when we return the
current intent and furthermore is the only timestamp we really have around.
We could return the transactions current read or write timestamp but those
both seem like worse choices.

It's also worth noting that in the case where we need to read a non-zero
value, we don't really care what that value is and the fact that we are
reading this intent itself is somewhat suspect. That being said, I still
think we should make this change in addition to any change made to prevent
the code referenced in #49266 from needing it.

Fixes #49266.
Informs #50102.

Release note: None

52384: sql: properly reset extraTxnState in COPY r=ajwerner a=ajwerner

Apparently we support some sort of COPY protocol that I know nothing about.
We allow operations in that protocol in the open state and in the noTxn state
in the connExecutor. In the noTxn state we let the `copyMachine` handle its
transaction lifecycle all on its own. In that case, we also hand have the
`connExecutor` in a fresh state when calling `execCopyIn()`. During the
execution of `COPY`, it seems like sometime we can pick up table descriptor
leases. In the noTxn case we'd like to drop those leases and generally leave
the executor in the fresh state in which it was handed to us. To deal with
that, we call `resetExtraTxnState` before returning in the happy noTxn case.

Fixes #52065.

Release note (bug fix): Fixed a bug when using the COPY protocol which could
prevent schema changes for up to 5 minutes.

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Aug 6, 2020
52163: sql: implement ConstructOpaque in the new factory r=yuzefovich a=yuzefovich

Addresses: #47473.

Release note: None

52408: sql/catalog/lease: attempt to fix a flakey test r=lucy-zhang a=ajwerner

I think this test was going to fail.

Fixes #52385.

Release note: None

52486: sqlbase,catalogkv: move more kv interactions to catalogkv r=ajwerner a=ajwerner

At this point the only remaining kv interactions in sqlbase are during type
and table interactions though there are also usages through the protoGetter.

This is a minor change in the work to pick apart sqlbase.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@cucaroach cucaroach self-assigned this Oct 7, 2021
cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 12, 2021
cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 12, 2021
cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 12, 2021
cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 15, 2021
cucaroach added a commit to cucaroach/cockroach that referenced this issue Nov 18, 2021
craig bot pushed a commit that referenced this issue Nov 18, 2021
72653: sql: implement ConstructZigzagJoin in distsql_spec_exec_factory r=cucaroach a=cucaroach

Informs #47473

Release note: None


72825: sql: handle no cluster_inflight_traces table on tenants r=adityamaru a=aliher1911

Previously query for traces under tenant will fail with NPE.
This was happening because trace collector is not available
and virtual table was not aware of the case.
This patch adds handling of the table on tenant sql pods.

Release note: None

Fixes #72564

72871: backupccl: don't include tenants in non-cluster backups r=dt a=dt

Release note (bug fix): System tenant backups of individual tables and databases no longer include tenants as well.

72887: roachprod: remove ClusterImpl interface r=RaduBerinde a=RaduBerinde

The ClusterImpl interface was useful when roachprod had some (very
limited) support for Cassandra, but that has been removed. We're left
with an unnecessary indirection that lacks documentation. The code
doesn't follow any layering, with `Cockroach` working directly with
`SyncedCluster` internals and plenty of cockroach-specific code in
`SyncedCluster`.

This commit removes this interface and improves the documentation of
the methods that are simplified.

Release note: None

72894: kv: use version-less key in rditer.KeyRange r=nvanbenschoten a=nvanbenschoten

This commit replaces the use of `storage.MVCCKey` bounds with `roachpb.Key`
bounds in `rditer.KeyRange`. It addresses an existing TODO and helps clarify
things in #72121 slightly, as that PR is going to add a new field to `MVCCKey`.

72895: storage: remove decodeMVCCKey r=nvanbenschoten a=nvanbenschoten

The function was identical to `DecodeMVCCKey`, which is defined a few lines up in the file.

72902: storage: delete deprecated PutProto function r=nvanbenschoten a=nvanbenschoten

This function has been deprecated since 5e5eaa5. It was only used in one test, so this commit removes it.

72907: build/builder: install awscli v2  r=rail a=erikgrinaker

`roachprod` requires awscli v2, but the `builder` image included v1.

Release note: None

Co-authored-by: Tommy Reilly <treilly@cockroachlabs.com>
Co-authored-by: Oleg Afanasyev <oleg@cockroachlabs.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
@yuzefovich yuzefovich removed their assignment May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) meta-issue Contains a list of several other issues. T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

5 participants