Skip to content

Conversation

mpage
Copy link
Contributor

@mpage mpage commented Feb 3, 2024

There is a potential race when _PyParkingLot_UnparkAll is executing in one thread and another thread is unblocked because of an interrupt in _PyParkingLot_Park. Consider the following scenario:

  1. Thread T0 is blocked1 in _PyParkingLot_Park on address A.
  2. Thread T1 executes _PyParkingLot_UnparkAll on address A. It finds the wait_entry for T0 and unlinks2 its list node.
  3. Immediately after (2), T0 is woken up due to an interrupt. It then segfaults trying to unlink3 the node that was previously unlinked in (2).

To fix this we mark each waiter as unparking before releasing the bucket lock. _PyParkingLot_Park will wait to handle the coming wakeup, and not attempt to unlink the node, when this field is set. _PyParkingLot_Unpark does this already, presumably to handle this case.

Footnotes

  1. https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L303

  2. https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L369

  3. https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L320

…hen handling interrupts

There is a potential race when `_PyParkingLot_UnparkAll` is executing in
one thread and another thread is unblocked because of an interrupt in
`_PyParkingLot_Park`. Consider the following scenario:

1. Thread T0 is blocked[^1] in `_PyParkingLot_Park` on address `A`.
2. Thread T1 executes `_PyParkingLot_UnparkAll` on address `A`. It
   finds the `wait_entry` for `T0` and unlinks[^2] its list node.
3. Immediately after (2), T0 is woken up due to an interrupt. It
   then segfaults trying to unlink[^3] the node that was previously
   unlinked in (2).

To fix this we mark each waiter as unparking before releasing the bucket
lock. `_PyParkingLot_Park` will wait to handle the coming wakeup, and not
attempt to unlink the node, when this field is set. `_PyParkingLot_Unpark`
does this already, presumably to handle this case.

[^1]: https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L303
[^2]: https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L369
[^3]: https://github.com/python/cpython/blob/f35c7c070ca6b49c5d6a97be34e6c02f828f5873/Python/parking_lot.c#L320
dequeue_all(bucket, addr, &head);
llist_for_each(node, &head) {
struct wait_entry *waiter = llist_data(node, struct wait_entry, node);
waiter->is_unparking = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct, but let's push the waiter->is_unparking assignment into dequeue_all and dequeue to avoid the extra loop here.

@mpage mpage marked this pull request as ready for review February 3, 2024 03:14
@mpage
Copy link
Contributor Author

mpage commented Feb 3, 2024

@colesbury - Can you take another look?

@mpage
Copy link
Contributor Author

mpage commented Feb 3, 2024

@DinoV or @carljm - Any chance either of you can take a look / merge this?

@colesbury colesbury requested a review from DinoV February 5, 2024 19:52
@DinoV DinoV merged commit c32bae5 into python:main Feb 5, 2024
fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this pull request Feb 14, 2024
…gLot_UnparkAll` when handling interrupts (python#114945)

Fix race between `_PyParkingLot_Park` and `_PyParkingLot_UnparkAll` when handling interrupts

There is a potential race when `_PyParkingLot_UnparkAll` is executing in
one thread and another thread is unblocked because of an interrupt in
`_PyParkingLot_Park`. Consider the following scenario:

1. Thread T0 is blocked[^1] in `_PyParkingLot_Park` on address `A`.
2. Thread T1 executes `_PyParkingLot_UnparkAll` on address `A`. It
   finds the `wait_entry` for `T0` and unlinks[^2] its list node.
3. Immediately after (2), T0 is woken up due to an interrupt. It
   then segfaults trying to unlink[^3] the node that was previously
   unlinked in (2).

To fix this we mark each waiter as unparking before releasing the bucket
lock. `_PyParkingLot_Park` will wait to handle the coming wakeup, and not
attempt to unlink the node, when this field is set. `_PyParkingLot_Unpark`
does this already, presumably to handle this case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants