-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Use Threadpool to schedule the work #851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Additional Ideas to improve this PR:
@ggerganov for comments. |
@howard0su : I think you may have added some commits accidentally? There's already a PR for 997c749, I believe: #809 |
It helps me my local debug. I will revert this from this PR when this PR gets to better state. In the meantime, please help review that PR and merge it if it is proper. |
Call for some testing. The current change shows both performance improvement and energy improvement. However my devbox is a 10cores, 20threads E5 without AVX2. It is not very typical config. I need some help to validate the performance. |
I'll look in more details into the threading work after I finish with the quantization improvement efforts and some other pending stuff. But at the moment, I can immediately say that these extra |
did run the benchmark script on m1 mac, its up to 10 threads, orange is threadpool, blue is master-(when the branch was forked=eeaa7b0492fc79baab8bb1fe195d6c87159f2bd3) cant explain why we cant see the windows improvements, except for when we are at 10 threads where thread pool is better. threadpool master: |
it may relate to that pthread functions are different. Do you mind checking if switching to lock-free queue will help? When you using up all threads, my testing also shows threadpool is significant better. but the overall time is not lower than max-2 threads. |
@howard0su , @besnardjb any updates? |
I cannot use this code to full utilize all CPU. based on PR #710 :