-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: More granular locking #199
Comments
Caveat: I am not the owner of targetcli-fb, just a contributor. I wonder what parts of the actual code in LIO/targetcli-fb need to be mutually exclusive, i.e. what parts of the code screw up if we have multiple stacks running at the same time. But perhaps that is a much harder problem to solve? (I haven't looked at why a lock is needed, myself.) With this approach, what is your plan for a user running the CLI, and things are going fine, then the next thing they type fails because some other process got the lock and is taking a while? Would the first user see an error like "sorry, busy, please retry", or would the request hang until they got the lock? And what about batch mode, i.e. feeding a bunch of commands to targetcli-fb from stdin via a script. What if half the commands succeed, but then one of them fails. Should the script fail? Or should all scripts be re-written to handle the lock being stolen out from under us? Maybe you'd need a new global configuration variable that says "wait_if_lock_stolen"? Also, what if we are waiting for a lock in the middle of our script, and the other process dies/core dumps, and leaves the lock hanging. Normally, at startup, we check for this condition (I believe), but now we'd have to check each time the lock is held by somebody else? |
The pull request I linked contains some details about why the lock is supposedly necessary (including a reproducer for a genuine bug, so I do believe a lock of some sort makes sense).
I think some inspiration can be drawn from prior art here. The first example that comes to mind is In general, I would intuitively expect every command in "interactive mode" to behave exactly as it would if it were executed from the shell directly.
That is a more interesting problem. I guess for scripts I would expect the current behavior (i.e. take the lock when Summing up, I would expect the following:
So the only thing I am really proposing is that the lock should not be acquired when an interactive shell is idle. |
Currently (as of #140), targetcli takes a global
/var/lock/targetcli.lock
. This poses a problem when combined with interactive mode because the lock will be taken the entire time the targetcli shell is open.When an administrator leaves a targetcli shell open (perhaps even unknowingly in some SSH session or
screen
in the background), this effectively means that no targetcli operations can be completed, leading to disastrous consequences when a cluster manager such as Pacemaker is involved. In the event of a failover, the iSCSI target cannot start on the new primary node.We have observed this behavior leading to user confusion and – in one instance – even an actual production outage.
My proposal is: when running in interactive mode, only take the global lock when some action is actually running.
When the shell is just sitting there idly, there is no need to take the lock.
Of course, this implies a potentially unwanted behavioral change (hence the request for comments from the maintainers): in the current version, it is guaranteed1 that the state won't change while an interactive shell is open.
Please let me know your thoughts about this issue.
1 As guaranteed as it can be, considering you could still manually mess around in the configfs
The text was updated successfully, but these errors were encountered: