-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
|BUG] ggml spawns threads even BLAS is used #578
Comments
Possible workaround: |
The The thread management in
Currently, when Mutex + condition variables sounds like the solution, but my experiments show that they are in fact slower compared to busy waits. Yes, the power consumption is much lower, but the performance is lower too. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
ggml should not spawn threads for the initial prompt ingestion when using BLAS.
Current Behavior
ggml does spawn threads even when using BLAS.
Environment and Context
Reproducible using latest OpenBLAS with PR OpenMathLib/OpenBLAS#3970 (for Intel 13th gen support) and Intel MKL's BLAS implementation.
Ubuntu 22.04 with custom Kernel
Linux XXX 6.1.6-060106-generic #202301141035 SMP PREEMPT_DYNAMIC Sat Jan 14 11:15:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Failure Information (for bugs)
Read this discussion for full context #229 (reply in thread)
@slaren mentioned that the issue is:
Steps to Reproduce
I tried using -b 256 and -b 512, and ggml's 6 threads (from -t 6) are still spawned by ggml (alongside BLAS threads) when doing initial prompt ingestion:
llama -m /opt/models/llama-30B/ggml-model-q4_0.bin -n -1 --color -i -r "User:" -f /opt/prompts/chat-with-bob.txt -t 6 -b 256 -c 2048
Using
-t 1
yields the expected behavior (only 1 thread for ggml, and the threads I set in env variable for BLAS)Failure Logs
htop shows more core usages than expected.
The text was updated successfully, but these errors were encountered: