-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: enable simple DistSQL distribution for IMPORT in multi-tenant configurations #74548
Labels
A-multitenancy
Related to multi-tenancy
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
Comments
These should all be changed to SQLInstanceIDs in the protos. The underlying type would be the same so the change would be backward compatible (the SQLInstanceID values are identical to the NodeIDs in all current usages of distsql APIs). |
rharding6373
added a commit
to rharding6373/cockroach
that referenced
this issue
Feb 15, 2022
Previously, distsql did not support distributed queries in multi-tenant environments. Queries were always planned using a single SQL pod, even if using multiple pods could yield better parallelism. This change adds an interface inside distsql that allows different implementations for functions that differ between single- and multi-tenant. Currently, it only has `SetupAllNodesPlanning`, which for single-tenant remains the same, but in multi-tenant returns all available SQL pods at the time it is invoked. This PR also fixes some places where SQL pods need to communicate with other SQL pods using the PodNodeDialer instead of the NodeDialer that handles SQL <-> KV communication in multi-tenant mode. Fixes cockroachdb#74548 Release note (sql change): Adds support for distributed import queries in multi-tenant environments, which allows import queries to have improved parallelism by utilizing all available SQL pods in the tenant.
rharding6373
added a commit
to rharding6373/cockroach
that referenced
this issue
Feb 16, 2022
Previously, distsql did not support distributed queries in multi-tenant environments. Queries were always planned using a single SQL pod, even if using multiple pods could yield better parallelism. This change adds an interface inside distsql that allows different implementations for functions that differ between single- and multi-tenant. Currently, it only has `SetupAllNodesPlanning`, which for single-tenant remains the same, but in multi-tenant returns all available SQL pods at the time it is invoked. This PR also fixes some places where SQL pods need to communicate with other SQL pods using the PodNodeDialer instead of the NodeDialer that handles SQL <-> KV communication in multi-tenant mode. Fixes cockroachdb#74548 References cockroachdb#47900 Release note (sql change): Adds support for distributed import queries in multi-tenant environments, which allows import queries to have improved parallelism by utilizing all available SQL pods in the tenant. review
rharding6373
added a commit
to rharding6373/cockroach
that referenced
this issue
Feb 17, 2022
Previously, distsql did not support distributed queries in multi-tenant environments. Queries were always planned using a single SQL pod, even if using multiple pods could yield better parallelism. This change adds an interface inside distsql that allows different implementations for functions that differ between single- and multi-tenant. Currently, it only has `SetupAllNodesPlanning`, which for single-tenant remains the same, but in multi-tenant returns all available SQL pods at the time it is invoked. This PR also fixes some places where SQL pods need to communicate with other SQL pods using the PodNodeDialer instead of the NodeDialer that handles SQL <-> KV communication in multi-tenant mode. Fixes cockroachdb#74548 References cockroachdb#47900 Release note (sql change): Adds support for distributed import queries in multi-tenant environments, which allows import queries to have improved parallelism by utilizing all available SQL pods in the tenant. review
rharding6373
added a commit
to rharding6373/cockroach
that referenced
this issue
Feb 23, 2022
Previously, distsql did not support distributed queries in multi-tenant environments. Queries were always planned using a single SQL pod, even if using multiple pods could yield better parallelism. This change adds an interface inside distsql that allows different implementations for functions that differ between single- and multi-tenant. Currently, it only has `SetupAllNodesPlanning`, which for single-tenant remains the same, but in multi-tenant returns all available SQL pods at the time it is invoked. This PR also fixes some places where SQL pods need to communicate with other SQL pods using the PodNodeDialer instead of the NodeDialer that handles SQL <-> KV communication in multi-tenant mode. Fixes cockroachdb#74548 References cockroachdb#47900 Release note (sql change): Adds support for distributed import queries in multi-tenant environments, which allows import queries to have improved parallelism by utilizing all available SQL pods in the tenant.
rharding6373
added a commit
to rharding6373/cockroach
that referenced
this issue
Feb 24, 2022
Previously, distsql did not support distributed queries in multi-tenant environments. Queries were always planned using a single SQL pod, even if using multiple pods could yield better parallelism. This change adds an interface inside distsql that allows different implementations for functions that differ between single- and multi-tenant. Currently, it only has `SetupAllNodesPlanning`, which for single-tenant remains the same, but in multi-tenant returns all available SQL pods at the time it is invoked. This PR also fixes some places where SQL pods need to communicate with other SQL pods using the PodNodeDialer instead of the NodeDialer that handles SQL <-> KV communication in multi-tenant mode. Fixes cockroachdb#74548 References cockroachdb#47900 Release note (sql change): Adds support for distributed import queries in multi-tenant environments, which allows import queries to have improved parallelism by utilizing all available SQL pods in the tenant.
rharding6373
added a commit
to rharding6373/cockroach
that referenced
this issue
Feb 24, 2022
Previously, distsql did not support distributed queries in multi-tenant environments. Queries were always planned using a single SQL pod, even if using multiple pods could yield better parallelism. This change adds an interface inside distsql that allows different implementations for functions that differ between single- and multi-tenant. Currently, it only has `SetupAllNodesPlanning`, which for single-tenant remains the same, but in multi-tenant returns all available SQL pods at the time it is invoked. This PR also fixes some places where SQL pods need to communicate with other SQL pods using the PodNodeDialer instead of the NodeDialer that handles SQL <-> KV communication in multi-tenant mode. Fixes cockroachdb#74548 References cockroachdb#47900 Release note (sql change): Adds support for distributed import queries in multi-tenant environments, which allows import queries to have improved parallelism by utilizing all available SQL pods in the tenant.
rharding6373
added a commit
to rharding6373/cockroach
that referenced
this issue
Feb 25, 2022
Previously, distsql did not support distributed queries in multi-tenant environments. Queries were always planned using a single SQL pod, even if using multiple pods could yield better parallelism. This change adds an interface inside distsql that allows different implementations for functions that differ between single- and multi-tenant. Currently, it only has `SetupAllNodesPlanning`, which for single-tenant remains the same, but in multi-tenant returns all available SQL pods at the time it is invoked. This PR also fixes some places where SQL pods need to communicate with other SQL pods using the PodNodeDialer instead of the NodeDialer that handles SQL <-> KV communication in multi-tenant mode. Fixes cockroachdb#74548 References cockroachdb#47900 Release note (sql change): Adds support for distributed import queries in multi-tenant environments, which allows import queries to have improved parallelism by utilizing all available SQL pods in the tenant.
craig bot
pushed a commit
that referenced
this issue
Feb 25, 2022
76566: distsql: adds support for distributed bulk ops in multi-tenant r=rharding6373 a=rharding6373 distsql: adds support for distributed bulk ops in multi-tenant Previously, distsql did not support distributed queries in multi-tenant environments. Queries were always planned using a single SQL pod, even if using multiple pods could yield better parallelism. This change adds an interface inside distsql that allows different implementations for functions that differ between single- and multi-tenant. Currently, it only has `SetupAllNodesPlanning`, which for single-tenant remains the same, but in multi-tenant returns all available SQL pods at the time it is invoked. This PR also fixes some places where SQL pods need to communicate with other SQL pods using the PodNodeDialer instead of the NodeDialer that handles SQL <-> KV communication in multi-tenant mode. Fixes #74548 References #47900 Release note (sql change): Adds support for distributed import queries in multi-tenant environments, which allows import queries to have improved parallelism by utilizing all available SQL pods in the tenant. importccl, backupccl: add import and restore tenant tests with multiple SQL pods These tests validate that imports and restores can be distributed across multiple SQL pods in a single tenant. Release note: None 76958: ci: a couple tweaks to bazel CI r=rail a=rickystewart * Run `go mod tidy` in Bazel CI. * When running unit tests, write the profile file to `/artifacts` so it can be inspected later if necessary. Release note: None 77025: sql: fix error handling for AOST clause r=RichardJCai a=rafiss fixes #76979 fixes #76978 fixes #76977 fixes #76975 fixes #76787 The AOST clause logic would previously error for any bounded staleness query that was used in a prepared statement. This is because prepared statements use an implicit txn, so the same AOST clause has to be checked multiple times in the txn. A recent commit refactored the error handling and caused this bug. Release note: None 77044: schemachanger: bump size of test r=rail a=rickystewart Release note: None Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
RajivTS
pushed a commit
to RajivTS/cockroach
that referenced
this issue
Mar 6, 2022
Previously, distsql did not support distributed queries in multi-tenant environments. Queries were always planned using a single SQL pod, even if using multiple pods could yield better parallelism. This change adds an interface inside distsql that allows different implementations for functions that differ between single- and multi-tenant. Currently, it only has `SetupAllNodesPlanning`, which for single-tenant remains the same, but in multi-tenant returns all available SQL pods at the time it is invoked. This PR also fixes some places where SQL pods need to communicate with other SQL pods using the PodNodeDialer instead of the NodeDialer that handles SQL <-> KV communication in multi-tenant mode. Fixes cockroachdb#74548 References cockroachdb#47900 Release note (sql change): Adds support for distributed import queries in multi-tenant environments, which allows import queries to have improved parallelism by utilizing all available SQL pods in the tenant.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-multitenancy
Related to multi-tenancy
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
Currently, there is no DistSQL functionality in multi-tenant clusters. This issue represents the scope of work necessary to enable the simple type of DistSQL flows that are used by
RESTORE
andIMPORT
, that simply distribute a processor to all nodes in a cluster.IMPORT
in particular is a good case study, since it doesn't use any information about the ranges on a node - it simply round robins its work across all nodes, with no heuristics about what data the nodes might already have as leaseholders.The general outline of this work is as follows:
iteratePods
inpkg/server/tenant_status.go
, which uses the tenant pod ID directory to perform a fan-out operation over all of the pods in a cluster.SetupAllNodesPlanning
inpkg/sql/distsql_plan_bulk.go
which collects a list of nodes from a different, non multi-tenant aware node directorySetupAllNodesPlanning
code that uses the pod directory instead of the node directoryimport_processor_planning.go
that useSetupAllNodesPlanning
.There are definitely several unknowns here. For one, all of the DistSQL code expects a NodeID (e.g. the
Node
field in processor specs, the arguments toOutbox.Run
, etc etc), but we now will have SQLInstanceIds. This may not matter - they're both wrappers around integers, and in an MT configuration I don't think SQL has access to KV nodes anyway - but it's something to worry about. Another thing that I'm not sure of is whether SQL pods in an MT configuration even spawn a distsql server that's properly hooked up.The completion criteria for this work is that
IMPORT
should be runnable in a way that utilizes all currently active (at the time ofIMPORT
invocation) SQL pods on a multi-tenant cluster.Epic: CRDB-11540
The text was updated successfully, but these errors were encountered: