Support or disallow named locks #3631

dweitzman · 2018-02-06T04:45:04Z

Named locks acquired using the mysql get_lock() function are released explicitly by a call to release_lock() or implicitly when the session terminates. The release_lock() call must be from the very same mysql thread/session that called get_lock().

The problem: vtgate doesn't ensure that locks are freed when your vttablet session ends (naturally or abnormally). To make matters worse, committing a transaction will swap out the underlying mysql session so that calling release_lock() after a commit simple won't work.

The simple thing to do in the short term: disallow calls to get_lock() in vtgate for now for safety. That might look something like this:

diff --git a/go/vt/vtgate/planbuilder/expr.go b/go/vt/vtgate/planbuilder/expr.go
index 1e6151ee3..30aa7ce12 100644
--- a/go/vt/vtgate/planbuilder/expr.go
+++ b/go/vt/vtgate/planbuilder/expr.go
@@ -113,6 +113,9 @@ func findOrigin(expr sqlparser.Expr, bldr builder) (origin columnOriginator, err
                        subroutes = append(subroutes, subroute)
                        return false, nil
                case *sqlparser.FuncExpr:
+                       if node.Name.EqualString("get_lock") {
+                               return false, errors.New("unsupported: GET_LOCK() is disallowed until we can ensure RELEASE_LOCK() safety")
+                       }
                        // If it's last_insert_id, ensure it's a single unsharded route.
                        if !node.Name.EqualString("last_insert_id") {
                                return true, nil

Test case:

"select get_lock('lock1', 10) from user"
"unsupported: GET_LOCK() is disallowed until we can ensure RELEASE_LOCK() safety"

It seems like it should be possible to support named locks, though. A possible approach:

Recognize when get_lock() has been called and pin the current mysql session to the vitess session so they it can't be recycled after events like a commit or terminating the vttablet session
When the vtgate session terminates, do one of the following:
- Terminate the mysql connection. This will have the correct behavior. Downside: the connection can't be returned to the connection pool
- Keep a record of the locks that have been acquired in vttablet session state and explicitly free them all by name. Downside: more bookkeeping, more understanding of the innards of sql queries
- In mysql 5.7 or higher, call RELEASE_ALL_LOCKS(). Downside: requires mysql 5.7 or higher

My vote is that killing the underlying mysql connection is the right short-term choice. It's simple and it'll behave correctly across all mysql versions

The text was updated successfully, but these errors were encountered:

dweitzman · 2018-02-15T00:45:06Z

Regarding connection pinning at the vtgate level, getting correct behavior would requiring sometimes establishing establishing as well as maintaining vttablet session without necessarily starting or being inside a transaction.

I'm not sure if vtgate or vttablet make any assumptions that would make it hard to separate the concept of a session from the concept of a transaction (aside from the current behavior of recycling your session at the vtgate level on every commit or between executing statements outside of a transaction).

sougou · 2018-02-17T22:09:42Z

For reference, the functionality for mysql'sGET_LOCK is documented here: https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock.

There are a few problems with this functionality that we need to address:

The behavior is version dependent.
The lock is owned by the client connection, which is difficult to implement in vitess. There are two derivative behaviors for this:
- The lock is auto-released if the client disconnects.
- The client is allowed to re-obtain a lock it has previously obtained. Also, this behavior is version dependent.
It's not clear if release_lock can be called from a different session. Probably not.

In light of the above behaviors, I think it will be better to implement this function in vitess from the ground up instead of relying on mysql. Here's the proposal:

The core functionality of get_lock will be implemented by vttablet. The functionality will be more akin to a traditional mutex. It will have the following properties:

A lock is not tied to a session.
A lock can be obtained only once until it is released. This also applies to the session that just obtained the lock.
A lock can be released by an explicit call to release_lock, or if the timeout is reached.
Anybody can release a lock. It's not required that the lock be released by the session that obtained it.
The release of a previously released lock is a no-op.
Because a lock is not tied to a session, it is mandatory to provide a timeout for it.
Only a master vttablte will honor this statement.

To obtain a vitess-wide lock, you can issue a select get_lock('name', 10), which is equivalent to select get_lock('name', 10) from dual. This will route to the first keyspace and first shard of vitess. If you're connected to a keyspace already, then the statement will be sent to the first shard of that keyspace. If you're connected to a specific shard, then it's sent to that shard.

Any other usage of get_lock beyond the above will either be an error, or will lead to undefined behavior.

Implementation

VTGate does not need to change. Here are the changes for vttablet:

Planbuilder

If the table name is dual, check the select list to see if it's a get_lock function call. Ensure it's the only expression on that list. If all conditions are met, build a new plan PlanGetLock.
Do the same for release_lock.
Should we verify that get_lock is not called under other circumstances? Probably not worth it.

Execution

Build a locker package (similar to messager) that has the following:

A map[string]struct {ctx context.Context; cancel context.CancelFunc; released chan struct{}}.
On GetLock, look for an entry in the map. If there is no entry, create one with a new context that has the specified timeout. Start a goroutine that waits on that context. Once the context is done, remove the entry from the map, and close the released channel.
On GetLock, if an entry is already present, wait on that context and the request context. If the lock context fires, the lock was obtained. Otherwised, we timed out.
On ReleaseLock, find the entry. Cancel the context. This will cause the goroutine to release the lock. Wait on the released channel. This guarantees that the entry has been removed from the map. Return success.
On ReleaseLock, if the entry is absent, just return success.

The state changes of the Locker module should be the same as the messager.

sougou · 2018-02-18T17:43:53Z

While running some tests, I realized that I misunderstood the timeout parameter: it only specifies how long to wait for a lock. It does not mean that the lock is held for that long.

So, I'm going to make some changes to the implementation:

Since a connection close cannot be detected by vttablet, we'll hold the lock for a maximum of the transaction timeout. After that, it will be auto-released. The default transaction timeout is 30s.
The timeout specified in get_lock() will be used only to decide how long to wait. Note that this cannot exceed the timeout set in the request context. So, it will be the minimum of the two.

dweitzman · 2018-02-18T23:46:11Z

It seems important that locks not be taken away from code that believes it holds the lock without being explicitly released by the owner or by the entire db session being terminated and no longer accepting new sql commands (or commits).

Maybe for an alternative to detecting a connection close it would be enough to notice an idle period when no request is active in the transaction? You'd be able to hold a lock indefinitely as long as you keep using it. If you stop using it for a few seconds then the session could be killed as a precaution. With the binary protocol, vtgate could explicitly kill the session sooner if the TCP connection dies.

sougou · 2018-02-19T00:35:17Z

The lock is a system-wide guarantee. It's held by one vttablet (chosen as described above), and that vttablet cannot know activity in other vttablets where transactions are happening.

However, the timeout for the lock is the same as the transaction. So, if the transaction doesn't complete within the timeout, it will be rolled back, and the lock will also be released due the same timeout.

Issue #3631 get_lock and release_lock cannot be passed through to MySQL because their behavior is connection dependent, which doesn't work well for distributed systems, and for pooled connections. So, we implement our own version of these functions that are more distributed system friendly, with timeouts as well as session independent behaviors. More details on the issue.

dweitzman · 2018-02-20T15:35:31Z

Restricting named locks to master seems reasonable.

Here are some of expectations that client code may have for the behavior of named locks within a client "session". I put "session" in quotes because that concept doesn't exist in vitess (none of these examples use transactions):

Locks don't survive reparents:

1> use keyspace:id@master;
1> select get_lock('lock_name', 10);
{reparent}
1> select 1; <-- should return a permanent error to the client so it knows to abandon the session

Locks can't be released by the wrong person, or if they can they should kill the original session:

1> use keyspace:id@master;
1> select get_lock('lock_name', 10);
2> use keyspace:id@master;
2> select release_lock('lock_name') <-- should return 0, or it might be also ok for this to kill session 1 as long as we know that get_lock() really holds a lock since nobody would be able to call release_lock() by accident without a coding error like failing to check the return value of get_lock
1> select 1; <-- either works if release_lock was blocked, or return a permanent error if we allow random folks to kill locks

Disconnecting with the mysql binary protocol should release all named locks:

1> use keyspace:id@master;
1> select get_lock('lock_name', 10);
1> [explicitly close a session by disconnecting from mysql binary protocol or calling close in grpc, were a close function created]
^-- should release the lock
2> use keyspace:id@master;
2> select get_lock('lock_name', 10); <-- succeeds

What happens if people try to hold a lock forever?... (maybe a client gets a lock with grpc and then crashes):

1> use keyspace:id@master;
1> select get_lock('lock_name', 10);
1> select sleep(99999);
??? desired behavior: maybe kill the existing lock-holder if more than 'timeout' seconds have passed and someone new calls get_lock(), based on the guess that if you're only willing to wait N seconds for the lock than you expect not to be holding the lock longer than that?... ????
1> select 1; <-- if the lock was timed out somehow, then this should return a permanent error

Satisfying these expectations seems to want some kind of vtgate awareness of a "session" which persists across transactions.

sougou · 2018-02-20T16:16:23Z

Could you describe the use cases in more detail?

We've already established that full mysql compatibility is not going to be practical. So, I think we're looking at a collaboration where vitess tries to meet the app half way, and some changes will have to be made in the app itself.

Named locks are unsafe with server-side connection pooling. See vitessio#3631 for background. Signed-off-by: David Weitzman <dweitzman@pinterest.com>

sougou added Type: Feature Request P2 labels Feb 7, 2018

sougou mentioned this issue Feb 19, 2018

vttablet: get_lock and release_lock #3663

Closed

dweitzman mentioned this issue May 6, 2018

Disallow GET_LOCK() calls in vttablet. #3900

Merged

dweitzman added a commit to dweitzman/vitess that referenced this issue May 31, 2018

Disallow GET_LOCK() calls in vttablet.

771863d

Named locks are unsafe with server-side connection pooling. See vitessio#3631 for background. Signed-off-by: David Weitzman <dweitzman@pinterest.com>

dweitzman closed this as completed Jun 2, 2018

dweitzman mentioned this issue Jun 2, 2018

Command line flag to explicitly allow GET_LOCK() in vttablet #4000

Closed

dweitzman mentioned this issue Apr 27, 2020

RFC: SET Statements #6107

Closed

vkovacik mentioned this issue Apr 22, 2022

Bug Report: Unexpected transaction rollback on client session that previously executed GET_LOCK #10130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support or disallow named locks #3631

Support or disallow named locks #3631

dweitzman commented Feb 6, 2018 •

edited

Loading

dweitzman commented Feb 15, 2018

sougou commented Feb 17, 2018

sougou commented Feb 18, 2018

dweitzman commented Feb 18, 2018 •

edited

Loading

sougou commented Feb 19, 2018

dweitzman commented Feb 20, 2018

sougou commented Feb 20, 2018 •

edited

Loading

Support or disallow named locks #3631

Support or disallow named locks #3631

Comments

dweitzman commented Feb 6, 2018 • edited Loading

dweitzman commented Feb 15, 2018

sougou commented Feb 17, 2018

Implementation

Planbuilder

Execution

sougou commented Feb 18, 2018

dweitzman commented Feb 18, 2018 • edited Loading

sougou commented Feb 19, 2018

dweitzman commented Feb 20, 2018

sougou commented Feb 20, 2018 • edited Loading

dweitzman commented Feb 6, 2018 •

edited

Loading

dweitzman commented Feb 18, 2018 •

edited

Loading

sougou commented Feb 20, 2018 •

edited

Loading