sql: add option to keep all table descriptor leases active on all nodes #23510

nvanbenschoten · 2018-03-06T21:47:23Z

#19005 made progress towards preventing table lease acquisition from impacting tail latency. We now renew table leases when they are in use and begin getting too close (< 1 minute) to their expiration.

However, if a table is not used for more than a minute (DefaultTableDescriptorLeaseRenewalTimeout), the next request that needs it may find that its old lease has expired. Further, if a table is not used for more than five minutes (DefaultTableDescriptorLeaseDuration), the next request is guaranteed to find that its old lease has expired. In these cases, the next request to use the table will first need to pay the latency cost of acquiring a table lease before it can do anything else. This is both surprising and potentially concerning for users of Cockroach who have strict latency requirements. Further, the effect of this issue is more pronounced in a geo-distributed cluster where a lease acquisition requires multiple round-trips that may take a significant amount of time.

Poorly remembered quote from @bdarnell:

"Back when using BigTable, I used to run a script to continually renew all leases. I never want to pay the latency cost of renewing a table lease synchronously."

We should add an option to automatically renew table leases continuously in the background even if they are not being used actively. This option could apply to all tables or to only a subset of tables. It would probably take the form of a SESSION SETTING, but it may make sense as a CLUSTER SETTING as well.

The downsides to this are:

lease acquisitions for a table may be wasted work if the table is never used

cc. @andreimatei @vivekmenezes

Jira issue: CRDB-5813

The text was updated successfully, but these errors were encountered:

andreimatei · 2018-03-06T22:48:32Z

But they why "keep all table descs" rather than acquire all leases on server startup?

nvanbenschoten · 2018-03-06T22:57:58Z

But then why "keep all table descs" rather than acquire all leases on server startup?

That's another option that's mostly equivalent. Either way, the important part is that we'd indefinitely renew table leases.

petermattis · 2018-04-03T16:54:28Z

I'm moving this to 2.1 as it doesn't seem very hard and the effect is being seen in the real world.

awoods187 · 2018-06-26T14:23:49Z

@robert-s-lee says he keeps encountering this as a problem in pocs. He has workarounds for customers that talk to him but many people will take the perf hit here without ever knowing it.

We should figure out if we can do it in 2.1

nstewart · 2018-06-26T14:55:40Z

@mjibson can you take a look at this from a cost perspective? We can see how it compares to our other priorities for this milestone or the next after you've reviewed it.

nstewart · 2018-06-26T18:48:12Z

Spoke with the team. We are targeting this for 2.1

dt · 2018-06-26T19:00:00Z

An all/nothing setting seems like it'd be risky to default to on, as it could be a problem for a cluster with thousands of tables, but would be unlikely to benefit to majority of users if defaulted to off.

I'm thinking a simple option might be instead to just pre-lease the first X tables, with X controlled via a setting and has a default s.t. we'd expect it to be "all" for most clusters, but still avoid issues for clusters with very large numbers of tables. Since we expect it to be "all tables" most of the time, this could be something simple, like just the first X in the descriptor table?

28725: sql: periodically refresh table leases r=vivekmenezes a=vivekmenezes This change periodically refreshes some of the table leases. The current limit is 50 tables and can be configured using sql.tablecache.lease.refresh_limit This change will eventually be replaced by epoch based table leases related to #23510 Release note (sql change): Fix problem with needing to run a periodic sql query on the outside to get good initial latency on a dormant cluster. Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com>

This will prevent initial requests hitting a server from blocking on lease acquisition. related to cockroachdb#23510 Release note: Fixed slowness caused by table lease acquisition at startup.

asubiotto · 2020-04-02T15:48:17Z

@lucy-zhang to triage/close

postamar · 2022-03-08T20:31:15Z

@ajwerner had explained to me that he wanted to replace the expiration column in system.lease with a session_id reference to sql liveness. I'm going to close this issue because whatever we end up doing, it's almost certainly not going to be what's described here. Please correct me and reopen if I'm wrong and/or completely off base here.

postamar · 2022-03-08T21:34:50Z

Here's the issue I was referring to above: #61419

nvanbenschoten added this to the Later milestone Mar 6, 2018

bdarnell mentioned this issue Mar 7, 2018

kv: connect transport lazily #23521

Merged

nvanbenschoten mentioned this issue Mar 13, 2018

perf: excessive not lease holder errors running tpcc #23543

Closed

andreimatei mentioned this issue Mar 21, 2018

sql: synchronous renewal of lease when close to expiration #24107

Closed

petermattis modified the milestones: Later, 2.1 Apr 3, 2018

jordanlewis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-bulkio-schema-changes labels Apr 27, 2018

robert-s-lee added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label May 6, 2018

nstewart assigned awoods187 Jun 26, 2018

nstewart assigned dt and unassigned awoods187 Jun 26, 2018

vivekmenezes removed this from the 2.1 milestone Aug 22, 2018

vivekmenezes mentioned this issue Aug 27, 2018

sql: initialize with a limited number of table leases #29081

Closed

asubiotto assigned thoszhang and unassigned vivekmenezes Apr 2, 2020

thoszhang removed their assignment Feb 26, 2021

jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021

postamar closed this as completed Mar 8, 2022

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: add option to keep all table descriptor leases active on all nodes #23510

sql: add option to keep all table descriptor leases active on all nodes #23510

nvanbenschoten commented Mar 6, 2018 •

edited by cockroach-jira-scripts

Loading

andreimatei commented Mar 6, 2018

nvanbenschoten commented Mar 6, 2018

petermattis commented Apr 3, 2018

awoods187 commented Jun 26, 2018 •

edited

Loading

nstewart commented Jun 26, 2018

nstewart commented Jun 26, 2018

dt commented Jun 26, 2018

asubiotto commented Apr 2, 2020

postamar commented Mar 8, 2022

postamar commented Mar 8, 2022

sql: add option to keep all table descriptor leases active on all nodes #23510

sql: add option to keep all table descriptor leases active on all nodes #23510

Comments

nvanbenschoten commented Mar 6, 2018 • edited by cockroach-jira-scripts Loading

andreimatei commented Mar 6, 2018

nvanbenschoten commented Mar 6, 2018

petermattis commented Apr 3, 2018

awoods187 commented Jun 26, 2018 • edited Loading

nstewart commented Jun 26, 2018

nstewart commented Jun 26, 2018

dt commented Jun 26, 2018

asubiotto commented Apr 2, 2020

postamar commented Mar 8, 2022

postamar commented Mar 8, 2022

nvanbenschoten commented Mar 6, 2018 •

edited by cockroach-jira-scripts

Loading

awoods187 commented Jun 26, 2018 •

edited

Loading