sql: session-scoped temporary objects do not work with tenant clusters #67401

andy-kimball · 2021-07-09T00:32:13Z

Repro:

export COCKROACH=./cockroach

# Create certificates.
export CERTSDIR=$HOME/.cockroach-certs
export COCKROACH_CA_KEY=$CERTSDIR/ca.key
$COCKROACH cert create-ca
$COCKROACH cert create-client root
$COCKROACH cert create-node 127.0.0.1 localhost
$COCKROACH mt cert create-tenant-client 123 --certs-dir=${HOME}/.cockroach-certs

# Start KV process.
$COCKROACH start-single-node --host 127.0.0.1

# Create tenant
$COCKROACH sql --host 127.0.0.1 -e 'select crdb_internal.create_tenant(123);'

# Start SQL tenant process.
COCKROACH_TRUST_CLIENT_PROVIDED_SQL_REMOTE_ADDR=true $COCKROACH mt start-sql --tenant-id 123 --kv-addrs 127.0.0.1:26257 --http-addr 127.0.0.1:8081 --sql-addr 127.0.0.1:36257

Notice that already there's a warning in the log, despite not having created any temporary objects:

W210709 00:19:08.472581 195 sql/temporary_schema.go:601  [-] 2  failed to clean temp objects: isMeta1Leaseholder is not available to secondary tenants

Now create a temporary table in the SQL CLI:

$COCKROACH sql --port 36257

SET experimental_enable_temp_tables = 'on';
CREATE TEMP TABLE t (x INT PRIMARY KEY, y INT);

While still in the CLI, kill the SQL tenant process with ctrl-C. Then start it up again. Back in the SQL CLI:

SHOW TABLES;

EXPECTED: The temp table should be gone, since it was tied to the previous session, which got terminated.
ACTUAL: The temp table is still showing:

root@:36257/defaultdb> show tables;
             schema_name            | table_name | type  | owner | estimated_row_count | locality
------------------------------------+------------+-------+-------+---------------------+-----------
  pg_temp_1625790426654356000_10001 | t          | table | root  |                   0 | NULL

Even if we decide not to fix this right now, we need to at least disable temporary object creation in tenant clusters and stop spamming the logs (logs are filled with the warning message even when temp objects are not ever used). That change needs to be back-ported to 21.1.x.

Also, here's @ajwerner's commentary from Slack:

Oh, that’s not good. Interesting that it’s new.
This needs some tracking. Seems like we didn’t think through temporary objects on tenants.
If a node crashes or is shut down with an open session that created temporary objects, nothing will ever clean them up.
There’s different levels of solutions. I think what I’d do is just get rid of the leasing component. It’s idemponent work. That being said, the code also makes RPCs to the other instances to find out which sessions are live which is also busted on tenants.
We’re working on the sessions listing stuff (I think). At that point, it’s just about deciding whether we care about the leasing. For that I think there are two simple solutions:
No leasing, just jitter a long loop
Use the fact that you have the lowest instance ID you know about like being the leaseholder of meta1.
All in all, it is non-trivial, but it is small.

The text was updated successfully, but these errors were encountered:

ajwerner · 2021-08-25T18:08:20Z

Alright, so this should proceed in two phases. Both of which should only affect secondary tenants, making it okay to be working on it at this point.

Just have the tenant pod always try the cleanup. This will not work if there is more than one tenant but it's something we can backport and it should be tiny. Skip this check if you're not the system tenant and put a big comment saying that it's fine because this instance is the only instance as of writing. You'll probably also need to plumb around some other dependencies

cockroach/pkg/sql/temporary_schema.go

Lines 477 to 479 in dbd810e

 // We only want to perform the cleanup if we are holding the meta1 lease. 

 // This ensures only one server can perform the job at a time. 

 isLeaseholder, err := c.isMeta1LeaseholderFunc(ctx, c.db.Clock().NowAsClockTimestamp())

Make it work with multiple sql pods. We're going to need to wire up the RPCs to check for session liveness, which I think should be doable now.

Then we need to orchestrate which node is going to do the work. There's a few options here. One is to not coordinate but just run the loop rarely and with a lot of jitter. I think that that would be totally totally fine. I think it's my preferred approach. Any contention should sort itself out.

If we did want to coordinate we could use a singleton job. See

The second one went further and created a scheduled job to do the cleanup. That could be cool. I don't think it's important.

ajwerner · 2021-08-25T18:11:49Z

So, in conclusion, the work item of 1. is to hook up the temporary cleaner for secondary tenants and make sure it works with an assumption that it's the only tenant. Then, for 2. we hook up the RPCs to ensure that we don't trash data that a live session is using. For hooking up those RPCs we're going to want to be rather careful that we have a complete view of the set of nodes. We may need to add some new uncached accessor for the set of instances. That set of instances is exposed by

cockroach/pkg/sql/sqlinstance/sqlinstance.go

Lines 34 to 39 in dbd810e

 // AddressResolver exposes API for retrieving the instance address and all live instances for a tenant. 

 type AddressResolver interface { 

 // GetInstance returns the InstanceInfo for a pod given the instance ID. 

 // Returns a NonExistentInstanceError if an instance with the given ID 

 // does not exist. 

 GetInstance(context.Context, base.SQLInstanceID) (InstanceInfo, error)

.

Fixes: cockroachdb#67401 Previously, the temporary table clean up did not execute for tenants. This was inadequate that temporary tables would last longer then the life span of user sessions. To address this, this patch adds support for cleaning up on a single tenant pod. Specifically removing checks for meta1 lease when under a tenant, and support for listing sessions. Release justification: low risk and fixes a tenant related bug Release note (bug fix): Temporary tables were not properly cleaned up for tenants.

69486: sql: support for temporary table clean up for tenants r=fqazi a=fqazi Fixes: #67401 Previously, the temporary table clean-up did not execute for tenants. This was inadequate that temporary tables would last longer than the life span of user sessions. To address this, this patch adds support for cleaning up on a single tenant pod. Specifically removing checks for meta1 lease when under a tenant, and support for listing sessions. Release justification: low risk and fixes a tenant-related bug Release note (bug fix): Temporary tables were not properly cleaned up for tenants. 69789: sql: drop database cascade can fail resolving schemas r=fqazi a=fqazi Fixes: #69713 Previously, drop database cascade would drop the schema first, and then drop any objects under those schemas after. This was inadequate because as we look up any objects under the schemas, we may need to resolve types, which will may lead to a look up on the schema. If the schema is dropped then we will fail while resolving any types. To address this, this patch drops the objects under the schema first followed by the database. Release justification: Low risk bug fix for drop database cascade Release note (bug fix): Drop database cascade can fail while resolving a schema in a certain scenarios with the following error: "ERROR: error resolving referenced table ID <ID>: descriptor is being dropped" 69975: sql: fix interaction between stmt bundles and tracing r=yuzefovich a=yuzefovich Previously, we wouldn't generate the bundle if the verbose tracing was already enabled on the cluster because we wouldn't call `instrumentationHelper.Finish` where we actually generate the bundle. This would result in empty responses for `EXPLAIN ANALYZE (DEBUG)` as well as the requests for stmt diagnostics being stuck in "waiting" state. Fixes: #69398. Release note (bug fix): Previously, if the tracing (`sql.trace.txn.enable_threshold` cluster setting) was enabled on the cluster, the statement diagnostics collection (`EXPLAIN ANALYZE (DEBUG)`) wouldn't work. This is now fixed. Release justification: low-risk fix to a long-standing bug. Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

Fixes: cockroachdb#67401 Previously, the temporary table clean up did not execute for tenants. This was inadequate that temporary tables would last longer then the life span of user sessions. To address this, this patch adds support for cleaning up on a single tenant pod. Specifically removing checks for meta1 lease when under a tenant, and support for listing sessions. Release justification: low risk and fixes a tenant related bug Release note (bug fix): Temporary tables were not properly cleaned up for tenants.

ajwerner · 2021-09-13T16:19:00Z

Reopening for backport.

andy-kimball added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 9, 2021

andy-kimball assigned ajwerner and postamar Jul 9, 2021

blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Jul 9, 2021

andy-kimball added this to the 21.1 milestone Jul 9, 2021

ajwerner unassigned ajwerner and postamar Jul 13, 2021

ajwerner assigned postamar Aug 3, 2021

fqazi self-assigned this Aug 25, 2021

fqazi mentioned this issue Aug 27, 2021

sql: support for temporary table clean up for tenants #69486

Merged

RaduBerinde added the A-multitenancy Related to multi-tenancy label Aug 31, 2021

craig bot closed this as completed in 4e76a9f Sep 10, 2021

fqazi mentioned this issue Sep 13, 2021

release-21.2: sql: support for temporary table clean up for tenants #70129

Merged

ajwerner added branch-release-21.2 release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 13, 2021

ajwerner reopened this Sep 13, 2021

ajwerner linked a pull request Sep 13, 2021 that will close this issue

release-21.2: sql: support for temporary table clean up for tenants #70129

Merged

ajwerner closed this as completed Sep 14, 2021

fqazi mentioned this issue Sep 27, 2021

release-21.2: sql: support for temporary table clean up for tenants #70762

Merged

rafiss mentioned this issue Jan 20, 2022

sql: TemporaryObjectCleaner needs work for multi-tenancy #47894

Closed

exalate-issue-sync bot unassigned postamar May 10, 2023

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: session-scoped temporary objects do not work with tenant clusters #67401

sql: session-scoped temporary objects do not work with tenant clusters #67401

andy-kimball commented Jul 9, 2021

ajwerner commented Aug 25, 2021

ajwerner commented Aug 25, 2021

ajwerner commented Sep 13, 2021

sql: session-scoped temporary objects do not work with tenant clusters #67401

sql: session-scoped temporary objects do not work with tenant clusters #67401

Comments

andy-kimball commented Jul 9, 2021

ajwerner commented Aug 25, 2021

ajwerner commented Aug 25, 2021

ajwerner commented Sep 13, 2021