Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for new API TryLock at Topo level. #11271

Closed
rsajwani opened this issue Sep 21, 2022 · 0 comments
Closed

Feature Request: Support for new API TryLock at Topo level. #11271

rsajwani opened this issue Sep 21, 2022 · 0 comments

Comments

@rsajwani
Copy link
Contributor

rsajwani commented Sep 21, 2022

Feature Description

Topo.LockShard is a blocking call. When concurrent calls to lock same shard happens, the client gets block until the lock releases itself or context times out. But this create an issue where a lot of clients try to acquire lock all together and result in server getting overwhelmed.

In order to avoid server getting overwhelmed we need to introduce TryLock which is non-blocking and return immediately if it cannot acquire lock instead of waiting at server.

Context

Originally we were trying to add optimization for LockShard under go/vt/orchestrator/logic/tablet_discovery.go. We wanted to call topo.CheckShardLocked before we call ts.LockShard at https://github.com/vitessio/vitess/blob/main/go/vt/orchestrator/logic/tablet_discovery.go#L260, to avoid excessive calls to lock.
But here are some observation

  1. Our lockshard is context bases. Which means it hold lock per context. https://github.com/vitessio/vitess/blob/main/go/vt/topo/locks.go#L296
  2. What this means is even calling CheckShardLocked will not prevent us to avoid calls for LockShard across multiple process. for e.g replManager and vtorc try to acquire lock for same shard.
  3. We already check for lock presence within ts.lockshard https://github.com/vitessio/vitess/blob/main/go/vt/topo/locks.go#L307, which means within same context we don't need an extra checkshardlocked call and across process anyways we don't handle concurrency (as it is context base)
  4. This does raise an interesting point that Lockshard does actually result in creating multiple Key/value under same path e.g /vitess/global/keyspaces/ks/shards/0/locks, but the way we avoid it is once we create the latest key/value we wait for older key/values to disappear before we give control back to client. That actually result in 10 second (TTL) block for client if it tries to acquire lock for already acquired path. We do this using revision id.
    https://github.com/vitessio/vitess/blob/main/go/vt/topo/etcd2topo/lock.go#L165
  5. We were able to repro state mentioned in step#4 , where for the same file path we can have multiple lock files. One is blocked on other to get expire hence client is blocked in second lock. See attached image in comments for call to get path /vitess/global/keyspaces/ks/shards/0.

Off course this state is ephemeral and last for 10 seconds. therefore we felt adding checkshardLock before lockshard won't help.

Optimization we can do here is make client lock call non-blocking. We should come up with semantics for tryLockAcquire. This will help us to un-block client in case of concurrency for up to TTL which is 10 seconds right now.

Use Case(s)

This can be called anywhere where we are doing LockShard call.

@rsajwani rsajwani added Type: Feature Needs Triage This issue needs to be correctly labelled and triaged labels Sep 21, 2022
@rsajwani rsajwani changed the title Feature Request: Support for new API TryShard at Topo level. Feature Request: Support for new API TryLock at Topo level. Sep 21, 2022
@mattlord mattlord added Component: Cluster management and removed Needs Triage This issue needs to be correctly labelled and triaged labels Sep 23, 2022
@rsajwani rsajwani self-assigned this Dec 1, 2022
@rsajwani rsajwani closed this as completed Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants