-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add option to keep all table descriptor leases active on all nodes #23510
Comments
But they why "keep all table descs" rather than acquire all leases on server startup? |
That's another option that's mostly equivalent. Either way, the important part is that we'd indefinitely renew table leases. |
I'm moving this to 2.1 as it doesn't seem very hard and the effect is being seen in the real world. |
@robert-s-lee says he keeps encountering this as a problem in pocs. He has workarounds for customers that talk to him but many people will take the perf hit here without ever knowing it. We should figure out if we can do it in 2.1 |
@mjibson can you take a look at this from a cost perspective? We can see how it compares to our other priorities for this milestone or the next after you've reviewed it. |
Spoke with the team. We are targeting this for 2.1 |
An all/nothing setting seems like it'd be risky to default to on, as it could be a problem for a cluster with thousands of tables, but would be unlikely to benefit to majority of users if defaulted to off. I'm thinking a simple option might be instead to just pre-lease the first X tables, with X controlled via a setting and has a default s.t. we'd expect it to be "all" for most clusters, but still avoid issues for clusters with very large numbers of tables. Since we expect it to be "all tables" most of the time, this could be something simple, like just the first X in the descriptor table? |
28725: sql: periodically refresh table leases r=vivekmenezes a=vivekmenezes This change periodically refreshes some of the table leases. The current limit is 50 tables and can be configured using sql.tablecache.lease.refresh_limit This change will eventually be replaced by epoch based table leases related to #23510 Release note (sql change): Fix problem with needing to run a periodic sql query on the outside to get good initial latency on a dormant cluster. Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com>
This will prevent initial requests hitting a server from blocking on lease acquisition. related to cockroachdb#23510 Release note: Fixed slowness caused by table lease acquisition at startup.
@lucy-zhang to triage/close |
@ajwerner had explained to me that he wanted to replace the expiration column in |
Here's the issue I was referring to above: #61419 |
#19005 made progress towards preventing table lease acquisition from impacting tail latency. We now renew table leases when they are in use and begin getting too close (< 1 minute) to their expiration.
However, if a table is not used for more than a minute (
DefaultTableDescriptorLeaseRenewalTimeout
), the next request that needs it may find that its old lease has expired. Further, if a table is not used for more than five minutes (DefaultTableDescriptorLeaseDuration
), the next request is guaranteed to find that its old lease has expired. In these cases, the next request to use the table will first need to pay the latency cost of acquiring a table lease before it can do anything else. This is both surprising and potentially concerning for users of Cockroach who have strict latency requirements. Further, the effect of this issue is more pronounced in a geo-distributed cluster where a lease acquisition requires multiple round-trips that may take a significant amount of time.Poorly remembered quote from @bdarnell:
We should add an option to automatically renew table leases continuously in the background even if they are not being used actively. This option could apply to all tables or to only a subset of tables. It would probably take the form of a
SESSION SETTING
, but it may make sense as aCLUSTER SETTING
as well.The downsides to this are:
cc. @andreimatei @vivekmenezes
Jira issue: CRDB-5813
The text was updated successfully, but these errors were encountered: