fix logic error and race condition #8

zerbina · 2024-04-13T16:32:39Z

There were both a race condition and logic error with the implementation, which could lead to spurious signals, crashes, and other undefined behaviour.

Race Condition

Thread 1:
...

suspend: acquires lock for actor A and pool
changes the state to Suspended
suspend: releases the locks
pauses

Thread 2:
....

resume: acquires lock for actor A and pool
actor state is Suspended, so the actor is queued
resume: releases the locks
...
actor A is popped from the work queue and resumes execution

Both thread 1 and 2 now have unguarded concurrent ownership of actor A, which constitutes a race condition.

Logic Error

Note: this is only one example of what could happen.

Thread 1:
...

suspend: acquires lock for actor A and pool
changes the state to Suspended
releases the lock for actor A and and pool
pauses

Thread 2:
...

resume: acquires lock for actor A and pool
actor state is Suspend, so the actor is queued
...
actor A resumes execution
jield is called, and the state of actor A changes to Jielding
...
workerThread: fetches the state of actor A (Jielding)
pauses

Thread 1:

workerThread: fetches the state of actor A (Jielding)
queues actor A for execution and changes the state to Queued
pauses

Thread 2:

the local state variable still contains Jielding, so the actor is queued again

The same actor is now in the work queue two times!

The Solution

In both cases, the problem is that another thread can assume ownership of an actor while the worker thread responsible for the actor hasn't finished suspending it, due to the actor state being eagerly set to Suspended by suspend.

suspend now sets the state to the new Suspending state first. When handling the Suspending state, if new signals were received in the meantime (sendSig cannot resume a suspending actor), the actor is queued again right away. Otherwise, it's suspended.

There were both a race condition and logic error with the implementation, which could lead to spurious signals, crashes, and other undefined behaviour. Race Condition -------------- Thread 1: ... 1. `suspend`: acquires lock for actor `A` and pool 2. changes the state to `Suspended` 3. `suspend`: releases the locks 4. pauses Thread 2: .... 1. `resume`: acquires lock for actor `A` and pool 2. actor state is `Suspended`, so the actor is queued 3. `resume`: releases the locks 4. ... 5. actor `A` is popped from the work queue and resumes execution Both thread 1 and 2 now have unguarded concurrent ownership of actor `A`, which constitutes a race condition. Logic Error ----------- Note: this is only *one* example of what could happen. Thread 1: ... 1. `suspend`: acquires lock for actor `A` and pool 2. changes the state to `Suspended` 3. releases the lock for actor `A` and and pool 4. pauses Thread 2: ... 1. `resume`: acquires lock for actor `A` and pool 2. actor state is `Suspend`, so the actor is queued 3. ... 4. actor `A` resumes execution 5. `jield` is called, and the state of actor `A` changes to `Jielding` 6. ... 7. `workerThread`: fetches the state of actor `A` (`Jielding`) 8. pauses Thread 1: 1. `workerThread`: fetches the state of actor `A` (`Jielding`) 2. queues actor `A` for execution and changes the state to `Queued` 3. pauses Thread 2: 1. the local `state` variable still contains `Jielding`, so the actor is queued again The same actor is now in the work queue two times! The Solution ------------ In both cases, the problem is that another thread can assume ownership of an actor while the worker thread responsible for the actor hasn't finished suspending it, due to the actor state being eagerly set to `Suspended` by `suspend`. `suspend` now sets the state to the new `Suspending` state first. When handling the `Suspending` state, if new signals were received in the meantime (`sendSig` cannot resume a *suspending* actor), the actor is queued again right away. Otherwise, it's suspended.

zevv · 2024-04-14T14:57:40Z

Thanks for this!

zevv merged commit 94e9831 into zevv:master Apr 14, 2024
1 of 3 checks passed

zerbina deleted the fix-pool-bug branch April 15, 2024 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix logic error and race condition #8

fix logic error and race condition #8

zerbina commented Apr 13, 2024

zevv commented Apr 14, 2024

fix logic error and race condition #8

fix logic error and race condition #8

Conversation

zerbina commented Apr 13, 2024

Race Condition

Logic Error

The Solution

zevv commented Apr 14, 2024