Skip to content

Conversation

@mayeut
Copy link
Contributor

@mayeut mayeut commented Nov 29, 2025

Most functions are using exec_blas but dgetrf uses exec_blas_async directly and the behavior after fork is not the same. It's currently deadlocking.

This adds a test case for this code path.
Reverts the lock introduced in #5170

All linux make builds do deadlock in 7750d50 continuous build CI. The CMake builds are not deadlocking because they do not build that specific test.

most functions are using exec_blas but dgetrf  uses exec_blas_async directly and the behavior after fork is not the same. It's currently deadlocking.
@martin-frbg
Copy link
Collaborator

does "currently" refer to current develop, i.e. is this addressing a different fault than the resolved one from #5520 ?

@mayeut
Copy link
Contributor Author

mayeut commented Nov 30, 2025

does "currently" refer to current develop, i.e. is this addressing a different fault than the resolved one from #5520 ?

yes, "currently" refer to current develop. I did not understand what was fixed by #5479 in the context of #5520 while investigating a deadlock with QEMU in MacPython/openblas-libs#238 (which is unrelated in the end).

#5479 removed the locks around the server shutdown but those were not the ones introduced in #5170 mentioned as the starting point for regression in #5520.

The current pseudo stack after fork (i.e. blas_server_avail == 0) is:

dgetrf
  exec_blas_async
    lock_server
      blas_thread_init
        lock_server => deadlock

All other functions use exec_blas which does not lock the server when checking blas_server_avail /calling blas_thread_init.

CI has now timed-out with the new added (at least for some combination given the test depends on build configuration).
I pushed the revert for the locks in #5170, CI shall be green again.

@mayeut mayeut marked this pull request as ready for review November 30, 2025 07:14
@mayeut mayeut changed the title chore: add test case for exec_blas_async after fork fix: deadlock in exec_blas_async after fork Nov 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants