Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in backend lock #6007

Open
jandubois opened this issue Nov 21, 2023 · 0 comments
Open

Fix race conditions in backend lock #6007

jandubois opened this issue Nov 21, 2023 · 0 comments
Labels
area/snapshots kind/story Work item that is linked from a kind/epic

Comments

@jandubois
Copy link
Member

jandubois commented Nov 21, 2023

The current implementation of the backend lock can still lead to race conditions, lockups, and crashes (e.g. #5849, #5846, #5695). This change is supposed to fix that.

The lock file

We will use the existence of the lockfile $APPHOME/backend.lock as the only source of truth about the desired backend state:

  • The file exists: backend should shut down and the VM is considered locked (no backend operations are allowed until the lock is removed).
  • The file doesn't exist: backend should start and the VM is considered unlocked.

Note that the locked state is no longer independently settable, but is derived from the desired backend state.

Application startup

  • The application starts the command server API.

  • When the command server is running, the app sends itself a PUT check_backend_lock command.

  • The check_backend_lock will check if the lock file exists

  • The UI must not be shown/enabled before check_backend_lock is executed. There must be no way for the user to start the backend before the lock state is being checked.

  • If the lock file doesn't exist, it will dismiss any modal locking dialog and start up the backend if it isn't already running.

  • If the lock file exists, it will shutdown the backend if it is already running. Then it will display a modal locking dialog explaining that the backend is locked. It will also disable any menu commands that might alter the backend state.

rdctl locking/unlocking the backend

rdctl will first create/delete the lock file, and then call the PUT check_backend_lock API. This guarantees that the backend either doesn't start up, or will be shut down.

This way there should be no scenario in which the app could miss the notification that the lock state has changed.

Lock reason

In order to communicate the reason why the backend is locked we'll change the backend.lock file from an empty file to a JSON file1. For now the file will include a single reason field that can be displayed in the modal lock dialog.

rdctl changes

The changes to rdctl are minimal:

  • PUT backend_state will be renamed to PUT check_backend_lock and the parameters will be removed.

  • The reason code will be written to the backend.lock file.

This means the new algorithm is already testable with the current rdctl implementation if PUT backend_state is treated the same as PUT check_backend_lock, and an empty lock file is treated as a generic reason.

Footnotes

  1. It is unclear if we want to rename the file to backend-lock.json.

@jandubois jandubois added kind/story Work item that is linked from a kind/epic area/snapshots labels Nov 21, 2023
@jandubois jandubois added this to the 1.12 milestone Nov 21, 2023
@gaktive gaktive removed this from the 1.12 milestone Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/snapshots kind/story Work item that is linked from a kind/epic
Projects
None yet
Development

No branches or pull requests

3 participants