Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concurrency issue in timeout timer value processing #337

Merged
merged 1 commit into from
Feb 11, 2019

Conversation

janneronkko
Copy link
Contributor

According to multiprocessing documentation for Value
(https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Value)
reads and writes are protected with lock when the lock argument is True
(the default) or the lock argument is an instance of Lock or RLock. The
documentation states that operations like += are not atomic as that
involves reading and writing.

On the worker side the critical section includes also storing finished task
result because the timeout could happen after the task function has
finished but before the result has been stored and timer.value has been
updated to tell the guard process that the task has been finished.

On the guard side the critical section includes all checks done to see
if the worker has timed out or died and the actual reincarnation function
because the worker could update timer value to -1 (idle) or -2 (recycle)
after the guard has seen timer value 0 (timeout) and is going to terminate
the worker.

According to multiprocessing documentation for Value
(https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Value)
reads and writes are protected with lock when the lock argument is True
(the default) or the lock argument is an instance of Lock or RLock. The
documentation states that operations like += are not atomic as that
involves reading and writing.

On the worker side the critical section includes also storing finished task
result because the timeout could happen after the task function has
finished but before the result has been stored and timer.value has been
updated to tell the guard process that the task has been finished.

On the guard side the critical section includes all checks done to see
if the worker has timed out or died and the actual reincarnation function
because the worker could update timer value to -1 (idle) or -2 (recycle)
after the guard has seen timer value 0 (timeout) and is going to terminate
the worker.
@codecov-io
Copy link

codecov-io commented Jan 28, 2019

Codecov Report

Merging #337 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #337      +/-   ##
=========================================
+ Coverage   90.39%   90.4%   +<.01%     
=========================================
  Files          43      43              
  Lines        2843    2845       +2     
=========================================
+ Hits         2570    2572       +2     
  Misses        273     273
Impacted Files Coverage Δ
django_q/cluster.py 91.68% <100%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44d963b...bca2205. Read the comment docs.

@janneronkko
Copy link
Contributor Author

I also spotted issue with the timeout and guard_cycle handling.

If guard_cycle is not something that a float can present precisely (like 1.3) timeouts won't work as the current code makes an assumption that timeout is multiple of guard_cycle (this requirement is not documented at all).

The following combinations do not work:

  • guard_cycle=2, timeout=5
  • guard_cycle=5, timeout=4
  • guad_cycle=0.4, timeout=1

@Koed00 Would you accept a PR that would refactor timer handling in such way that state of worker is put into a separate variable and the timer.value is only used for timeouts?
This way the timer.value == 0 comparison could be changed to timer.value <= 0.0 and the above issues are fixed.

@janneronkko
Copy link
Contributor Author

@Koed00 Is there something that needs to be changed or fixed?

@Koed00 Koed00 merged commit c84fd11 into Koed00:master Feb 11, 2019
@Koed00
Copy link
Owner

Koed00 commented Feb 11, 2019

Looks good. Thanks for the effort on this one.
I'll look into doing a release soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants