Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support or disallow named locks #3631

Closed
dweitzman opened this issue Feb 6, 2018 · 7 comments
Closed

Support or disallow named locks #3631

dweitzman opened this issue Feb 6, 2018 · 7 comments

Comments

@dweitzman
Copy link
Member

dweitzman commented Feb 6, 2018

Named locks acquired using the mysql get_lock() function are released explicitly by a call to release_lock() or implicitly when the session terminates. The release_lock() call must be from the very same mysql thread/session that called get_lock().

The problem: vtgate doesn't ensure that locks are freed when your vttablet session ends (naturally or abnormally). To make matters worse, committing a transaction will swap out the underlying mysql session so that calling release_lock() after a commit simple won't work.

The simple thing to do in the short term: disallow calls to get_lock() in vtgate for now for safety. That might look something like this:

diff --git a/go/vt/vtgate/planbuilder/expr.go b/go/vt/vtgate/planbuilder/expr.go
index 1e6151ee3..30aa7ce12 100644
--- a/go/vt/vtgate/planbuilder/expr.go
+++ b/go/vt/vtgate/planbuilder/expr.go
@@ -113,6 +113,9 @@ func findOrigin(expr sqlparser.Expr, bldr builder) (origin columnOriginator, err
                        subroutes = append(subroutes, subroute)
                        return false, nil
                case *sqlparser.FuncExpr:
+                       if node.Name.EqualString("get_lock") {
+                               return false, errors.New("unsupported: GET_LOCK() is disallowed until we can ensure RELEASE_LOCK() safety")
+                       }
                        // If it's last_insert_id, ensure it's a single unsharded route.
                        if !node.Name.EqualString("last_insert_id") {
                                return true, nil

Test case:

"select get_lock('lock1', 10) from user"
"unsupported: GET_LOCK() is disallowed until we can ensure RELEASE_LOCK() safety"

It seems like it should be possible to support named locks, though. A possible approach:

  • Recognize when get_lock() has been called and pin the current mysql session to the vitess session so they it can't be recycled after events like a commit or terminating the vttablet session
  • When the vtgate session terminates, do one of the following:
    • Terminate the mysql connection. This will have the correct behavior. Downside: the connection can't be returned to the connection pool
    • Keep a record of the locks that have been acquired in vttablet session state and explicitly free them all by name. Downside: more bookkeeping, more understanding of the innards of sql queries
    • In mysql 5.7 or higher, call RELEASE_ALL_LOCKS(). Downside: requires mysql 5.7 or higher

My vote is that killing the underlying mysql connection is the right short-term choice. It's simple and it'll behave correctly across all mysql versions

@dweitzman
Copy link
Member Author

Regarding connection pinning at the vtgate level, getting correct behavior would requiring sometimes establishing establishing as well as maintaining vttablet session without necessarily starting or being inside a transaction.

I'm not sure if vtgate or vttablet make any assumptions that would make it hard to separate the concept of a session from the concept of a transaction (aside from the current behavior of recycling your session at the vtgate level on every commit or between executing statements outside of a transaction).

@sougou
Copy link
Contributor

sougou commented Feb 17, 2018

For reference, the functionality for mysql'sGET_LOCK is documented here: https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock.

There are a few problems with this functionality that we need to address:

  • The behavior is version dependent.
  • The lock is owned by the client connection, which is difficult to implement in vitess. There are two derivative behaviors for this:
    • The lock is auto-released if the client disconnects.
    • The client is allowed to re-obtain a lock it has previously obtained. Also, this behavior is version dependent.
  • It's not clear if release_lock can be called from a different session. Probably not.

In light of the above behaviors, I think it will be better to implement this function in vitess from the ground up instead of relying on mysql. Here's the proposal:

The core functionality of get_lock will be implemented by vttablet. The functionality will be more akin to a traditional mutex. It will have the following properties:

  • A lock is not tied to a session.
  • A lock can be obtained only once until it is released. This also applies to the session that just obtained the lock.
  • A lock can be released by an explicit call to release_lock, or if the timeout is reached.
  • Anybody can release a lock. It's not required that the lock be released by the session that obtained it.
  • The release of a previously released lock is a no-op.
  • Because a lock is not tied to a session, it is mandatory to provide a timeout for it.
  • Only a master vttablte will honor this statement.

To obtain a vitess-wide lock, you can issue a select get_lock('name', 10), which is equivalent to select get_lock('name', 10) from dual. This will route to the first keyspace and first shard of vitess. If you're connected to a keyspace already, then the statement will be sent to the first shard of that keyspace. If you're connected to a specific shard, then it's sent to that shard.

Any other usage of get_lock beyond the above will either be an error, or will lead to undefined behavior.

Implementation

VTGate does not need to change. Here are the changes for vttablet:

Planbuilder

  • If the table name is dual, check the select list to see if it's a get_lock function call. Ensure it's the only expression on that list. If all conditions are met, build a new plan PlanGetLock.
  • Do the same for release_lock.
  • Should we verify that get_lock is not called under other circumstances? Probably not worth it.

Execution

Build a locker package (similar to messager) that has the following:

  • A map[string]struct {ctx context.Context; cancel context.CancelFunc; released chan struct{}}.
  • On GetLock, look for an entry in the map. If there is no entry, create one with a new context that has the specified timeout. Start a goroutine that waits on that context. Once the context is done, remove the entry from the map, and close the released channel.
  • On GetLock, if an entry is already present, wait on that context and the request context. If the lock context fires, the lock was obtained. Otherwised, we timed out.
  • On ReleaseLock, find the entry. Cancel the context. This will cause the goroutine to release the lock. Wait on the released channel. This guarantees that the entry has been removed from the map. Return success.
  • On ReleaseLock, if the entry is absent, just return success.

The state changes of the Locker module should be the same as the messager.

@sougou
Copy link
Contributor

sougou commented Feb 18, 2018

While running some tests, I realized that I misunderstood the timeout parameter: it only specifies how long to wait for a lock. It does not mean that the lock is held for that long.

So, I'm going to make some changes to the implementation:

  • Since a connection close cannot be detected by vttablet, we'll hold the lock for a maximum of the transaction timeout. After that, it will be auto-released. The default transaction timeout is 30s.
  • The timeout specified in get_lock() will be used only to decide how long to wait. Note that this cannot exceed the timeout set in the request context. So, it will be the minimum of the two.

@dweitzman
Copy link
Member Author

dweitzman commented Feb 18, 2018

It seems important that locks not be taken away from code that believes it holds the lock without being explicitly released by the owner or by the entire db session being terminated and no longer accepting new sql commands (or commits).

Maybe for an alternative to detecting a connection close it would be enough to notice an idle period when no request is active in the transaction? You'd be able to hold a lock indefinitely as long as you keep using it. If you stop using it for a few seconds then the session could be killed as a precaution. With the binary protocol, vtgate could explicitly kill the session sooner if the TCP connection dies.

@sougou
Copy link
Contributor

sougou commented Feb 19, 2018

The lock is a system-wide guarantee. It's held by one vttablet (chosen as described above), and that vttablet cannot know activity in other vttablets where transactions are happening.

However, the timeout for the lock is the same as the transaction. So, if the transaction doesn't complete within the timeout, it will be rolled back, and the lock will also be released due the same timeout.

sougou added a commit that referenced this issue Feb 19, 2018
Issue #3631
get_lock and release_lock cannot be passed through to MySQL because
their behavior is connection dependent, which doesn't work well
for distributed systems, and for pooled connections.

So, we implement our own version of these functions that are more
distributed system friendly, with timeouts as well as session
independent behaviors. More details on the issue.
@dweitzman
Copy link
Member Author

Restricting named locks to master seems reasonable.

Here are some of expectations that client code may have for the behavior of named locks within a client "session". I put "session" in quotes because that concept doesn't exist in vitess (none of these examples use transactions):

Locks don't survive reparents:

1> use keyspace:id@master;
1> select get_lock('lock_name', 10);
{reparent}
1> select 1; <-- should return a permanent error to the client so it knows to abandon the session

Locks can't be released by the wrong person, or if they can they should kill the original session:

1> use keyspace:id@master;
1> select get_lock('lock_name', 10);
2> use keyspace:id@master;
2> select release_lock('lock_name') <-- should return 0, or it might be also ok for this to kill session 1 as long as we know that get_lock() really holds a lock since nobody would be able to call release_lock() by accident without a coding error like failing to check the return value of get_lock
1> select 1; <-- either works if release_lock was blocked, or return a permanent error if we allow random folks to kill locks

Disconnecting with the mysql binary protocol should release all named locks:

1> use keyspace:id@master;
1> select get_lock('lock_name', 10);
1> [explicitly close a session by disconnecting from mysql binary protocol or calling close in grpc, were a close function created]
^-- should release the lock
2> use keyspace:id@master;
2> select get_lock('lock_name', 10); <-- succeeds

What happens if people try to hold a lock forever?... (maybe a client gets a lock with grpc and then crashes):

1> use keyspace:id@master;
1> select get_lock('lock_name', 10);
1> select sleep(99999);
??? desired behavior: maybe kill the existing lock-holder if more than 'timeout' seconds have passed and someone new calls get_lock(), based on the guess that if you're only willing to wait N seconds for the lock than you expect not to be holding the lock longer than that?... ????
1> select 1; <-- if the lock was timed out somehow, then this should return a permanent error

Satisfying these expectations seems to want some kind of vtgate awareness of a "session" which persists across transactions.

@sougou
Copy link
Contributor

sougou commented Feb 20, 2018

Could you describe the use cases in more detail?

We've already established that full mysql compatibility is not going to be practical. So, I think we're looking at a collaboration where vitess tries to meet the app half way, and some changes will have to be made in the app itself.

dweitzman added a commit to dweitzman/vitess that referenced this issue May 31, 2018
Named locks are unsafe with server-side connection pooling.

See vitessio#3631 for background.

Signed-off-by: David Weitzman <dweitzman@pinterest.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants