Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add option to keep all table descriptor leases active on all nodes #23510

Closed
nvanbenschoten opened this issue Mar 6, 2018 · 21 comments
Closed
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Mar 6, 2018

#19005 made progress towards preventing table lease acquisition from impacting tail latency. We now renew table leases when they are in use and begin getting too close (< 1 minute) to their expiration.

However, if a table is not used for more than a minute (DefaultTableDescriptorLeaseRenewalTimeout), the next request that needs it may find that its old lease has expired. Further, if a table is not used for more than five minutes (DefaultTableDescriptorLeaseDuration), the next request is guaranteed to find that its old lease has expired. In these cases, the next request to use the table will first need to pay the latency cost of acquiring a table lease before it can do anything else. This is both surprising and potentially concerning for users of Cockroach who have strict latency requirements. Further, the effect of this issue is more pronounced in a geo-distributed cluster where a lease acquisition requires multiple round-trips that may take a significant amount of time.

Poorly remembered quote from @bdarnell:

"Back when using BigTable, I used to run a script to continually renew all leases. I never want to pay the latency cost of renewing a table lease synchronously."

We should add an option to automatically renew table leases continuously in the background even if they are not being used actively. This option could apply to all tables or to only a subset of tables. It would probably take the form of a SESSION SETTING, but it may make sense as a CLUSTER SETTING as well.

The downsides to this are:

  • lease acquisitions for a table may be wasted work if the table is never used

cc. @andreimatei @vivekmenezes

Jira issue: CRDB-5813

@nvanbenschoten nvanbenschoten added this to the Later milestone Mar 6, 2018
@andreimatei
Copy link
Contributor

But they why "keep all table descs" rather than acquire all leases on server startup?

@nvanbenschoten
Copy link
Member Author

But then why "keep all table descs" rather than acquire all leases on server startup?

That's another option that's mostly equivalent. Either way, the important part is that we'd indefinitely renew table leases.

@petermattis
Copy link
Collaborator

I'm moving this to 2.1 as it doesn't seem very hard and the effect is being seen in the real world.

@jordanlewis jordanlewis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-bulkio-schema-changes labels Apr 27, 2018
@knz knz added A-schema-descriptors Relating to SQL table/db descriptor handling. C-performance Perf of queries or internals. Solution not expected to change functional behavior. and removed A-bulkio-schema-changes C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Apr 27, 2018
@robert-s-lee robert-s-lee added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label May 6, 2018
@awoods187
Copy link
Contributor

awoods187 commented Jun 26, 2018

@robert-s-lee says he keeps encountering this as a problem in pocs. He has workarounds for customers that talk to him but many people will take the perf hit here without ever knowing it.

We should figure out if we can do it in 2.1

@nstewart
Copy link
Contributor

@mjibson can you take a look at this from a cost perspective? We can see how it compares to our other priorities for this milestone or the next after you've reviewed it.

@nstewart nstewart assigned dt and unassigned awoods187 Jun 26, 2018
@nstewart
Copy link
Contributor

Spoke with the team. We are targeting this for 2.1

@dt
Copy link
Member

dt commented Jun 26, 2018

An all/nothing setting seems like it'd be risky to default to on, as it could be a problem for a cluster with thousands of tables, but would be unlikely to benefit to majority of users if defaulted to off.

I'm thinking a simple option might be instead to just pre-lease the first X tables, with X controlled via a setting and has a default s.t. we'd expect it to be "all" for most clusters, but still avoid issues for clusters with very large numbers of tables. Since we expect it to be "all tables" most of the time, this could be something simple, like just the first X in the descriptor table?

craig bot pushed a commit that referenced this issue Aug 17, 2018
28725: sql: periodically refresh table leases r=vivekmenezes a=vivekmenezes

This change periodically refreshes some of the table leases.
The current limit is 50 tables and can be configured using
sql.tablecache.lease.refresh_limit

This change will eventually be replaced by epoch based
table leases

related to #23510

Release note (sql change): Fix problem with needing to run
a periodic sql query on the outside to get good initial
latency on a dormant cluster.

Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com>
@vivekmenezes vivekmenezes removed this from the 2.1 milestone Aug 22, 2018
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Aug 27, 2018
This will prevent initial requests hitting a server from
blocking on lease acquisition.

related to cockroachdb#23510

Release note: Fixed slowness caused by table lease acquisition
at startup.
@asubiotto asubiotto assigned thoszhang and unassigned vivekmenezes Apr 2, 2020
@asubiotto
Copy link
Contributor

@lucy-zhang to triage/close

@thoszhang thoszhang removed their assignment Feb 26, 2021
@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
@postamar
Copy link
Contributor

postamar commented Mar 8, 2022

@ajwerner had explained to me that he wanted to replace the expiration column in system.lease with a session_id reference to sql liveness. I'm going to close this issue because whatever we end up doing, it's almost certainly not going to be what's described here. Please correct me and reopen if I'm wrong and/or completely off base here.

@postamar postamar closed this as completed Mar 8, 2022
@postamar
Copy link
Contributor

postamar commented Mar 8, 2022

Here's the issue I was referring to above: #61419

@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests