Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scan deadlock #178

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Fix scan deadlock #178

wants to merge 4 commits into from

Conversation

timoML
Copy link
Contributor

@timoML timoML commented Dec 16, 2024

Solves a deadlock in the scanning toolchain.

Description

The deadlock could occur, when scanning_optimize_logic is calling scanning_probe_logic.start_scan().
In this case, the scanning_optimize_logic thread holds a lock (scanning_probe_logic._thread_lock) when entering start_scan().
This lock can prevent __start_timer() to invoke the timer on the scanning_probe_logic via a BlockingQueuedConnection, since this threads wait on lock-release.
Rather surprisingly, the deadlock conditions seems to be timing-related and is not always found.
It might be the reason, why on some setups the optimizer freezes after hours of operation.

The solution is to call __start_timer() outside the thread lock. This is is safe, as the thread lock block before is guaranteed to have executed and __start_timer is not using any shared ressources.

Motivation and Context

For my dummy config, this deadlock can be reproduced by a fresh start of the scanning toolchain and start an optimization without any other prior action.
The gui hangs like this, with no way to recover:
image

How Has This Been Tested?

Dummy scanning toolchain, optimizing continuously via POIManager for > 24 h.

Types of changes

  • Bug fix
  • New feature
  • Breaking change (Causes existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • I have documented my changes in /docs/changelog.md.
  • My change requires additional/updated documentation.
  • I have updated the documentation accordingly.
  • I have added/updated the config example for any module docstrings as necessary.
  • I have checked that the change does not contain obvious errors
    (syntax, indentation, mutable default values, etc.).
  • I have tested my changes using 'Load all modules' on the default dummy configuration.
  • All changed Jupyter notebooks have been stripped of their output cells.

@timoML
Copy link
Contributor Author

timoML commented Dec 16, 2024

We should put a warning on similar code constructs to not call __start_timer() from within a thread lock.

@timoML timoML requested a review from TobiasSpohn December 16, 2024 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant