-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in saxpy_ #883
Comments
What if you add break in gdb at saxpy_ to see call parameters? |
It would be hard to do, there're plenty of calls that succeed before it hangs. Should I try to compile openblas/luajit with symbols? |
You have like enough symbols. Can you run backtrace on all threads
Current backtrace shows that last blas call on particular thread was saxpy and it ended well |
Actually it looks as if the thread that last called saxpy is spinning in sched_yield waiting for someone else to release a lock on something. There was some fundamental criticism of (still) using sched_yield instead of spinlocks on modern systems in #731 but "only" a less invasive solution (not spawning additional threads for small matrix sizes) was implemented there. IF the torch framework is multithreaded itself, it may make more sense to use a single-threaded OpenBLAS as stated in the FAQ
|
@martin-frbg I think for Matrix operations, Torch relies on underlying BLAS to do things in parallel. But regardless of inefficiencies of sched_yield discussed in #731, a deadlock is something even worse :) If you suggest things to be checked, I could try investigate this case deeper. |
It seems that building with NO_AFFINITY=1 solves the issue, so it could be related |
@brada4 here's the full one:
|
After reviewing the code carefully, yes, it could be possible that my Torch script does some BLAS operations in parallel from two threads. Using NO_AFFINITY=1 seems to solve the issue, but it's still strange. |
It looks like a duplicate of #660. This is not really a dead lock. It's a race condition. OpenBLAS is not thread safe when built with defaults parameters. When build with
I do not understand why |
Perhaps just subtle differences in timing if threads are allowed to schedule on any available core ? |
CUDA juggles affinity too. |
setting pthread affinity on main program thread makes subsequent threads created from it to inherit affinity mask and essentially make huge zoo on a single cpu core |
Hi,
I am using Torch7 to train conv nets, and in some regime I get infinite hangs from OpenBLAS with this stack trace (the hang is almost always reproducible, but requires a heavy setup, even minor changes to the code, like more debug prints around neural net modules lead to bug disappearing):
The OpenBLAS 0.2.18 was built with:
make -j32 FC=gfortran
The text was updated successfully, but these errors were encountered: