-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threadpool: take 2 #8672
Threadpool: take 2 #8672
Commits on Aug 27, 2024
-
Introduce ggml_compute_threadpool
- OpenMP functional: check - Vanilla ggml functional: Check - ggml w/threadpool functional: Check - OpenMP no regression: No glaring problems - Vanilla ggml no regression: No glaring problems - ggml w/threadpool no regression: No glaring problems
Configuration menu - View commit details
-
Copy full SHA for 130adf8 - Browse repository at this point
Copy the full SHA 130adf8View commit details -
Configuration menu - View commit details
-
Copy full SHA for a0aae52 - Browse repository at this point
Copy the full SHA a0aae52View commit details -
Configuration menu - View commit details
-
Copy full SHA for d5c9c14 - Browse repository at this point
Copy the full SHA d5c9c14View commit details -
Configuration menu - View commit details
-
Copy full SHA for 82224f8 - Browse repository at this point
Copy the full SHA 82224f8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 817eaf0 - Browse repository at this point
Copy the full SHA 817eaf0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5763732 - Browse repository at this point
Copy the full SHA 5763732View commit details -
fix deadlock for cases where cgraph.n_nodes == 1
and fix --poll case
Configuration menu - View commit details
-
Copy full SHA for 3008b31 - Browse repository at this point
Copy the full SHA 3008b31View commit details -
threadpool: use cpu_get_num_math to set the default number of threadp…
…ool threads This way we avoid using E-Cores and Hyperthreaded siblings.
Configuration menu - View commit details
-
Copy full SHA for 96d6603 - Browse repository at this point
Copy the full SHA 96d6603View commit details -
bench: create fresh threadpool for each test
For benchmarking it's better to start a fresh pool for each test with the exact number of threads needed for that test. Having larger pools is suboptimal (causes more load, etc).
Configuration menu - View commit details
-
Copy full SHA for 2953441 - Browse repository at this point
Copy the full SHA 2953441View commit details -
atomics: always use stdatomics with clang and use relaxed memory orde…
…r when polling in ggml_barrier This also removes sched_yield() calls from ggml_barrier() to match OpenMP behavior.
Configuration menu - View commit details
-
Copy full SHA for 6fcc780 - Browse repository at this point
Copy the full SHA 6fcc780View commit details -
threadpool: make polling the default to match openmp behavior
All command line args now allow for setting poll to 0 (false).
Configuration menu - View commit details
-
Copy full SHA for 3b62f7c - Browse repository at this point
Copy the full SHA 3b62f7cView commit details -
Configuration menu - View commit details
-
Copy full SHA for dfa6377 - Browse repository at this point
Copy the full SHA dfa6377View commit details -
fix potential race condition in check_for_work
fmz committedAug 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 2e18f0d - Browse repository at this point
Copy the full SHA 2e18f0dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 48aa8ee - Browse repository at this point
Copy the full SHA 48aa8eeView commit details -
threadpool: reduce pause/resume/wakeup overhead in common cases
We now start threadpool in paused state only if we have two. The resume is now implicit (ie new work) which allows for reduced locking and context-switch overhead.
Configuration menu - View commit details
-
Copy full SHA for 494e27c - Browse repository at this point
Copy the full SHA 494e27cView commit details -
threadpool: add support for hybrid polling
poll params (--poll, ...) now specify "polling level", i.e. how aggresively we poll before waiting on cond.var. poll=0 means no polling, 1 means poll for 128K rounds then wait, 2 for 256K rounds, ... The default value of 50 (ie 50x128K rounds) seems like a decent default across modern platforms. We can tune this further as things evolve.
Configuration menu - View commit details
-
Copy full SHA for b630acd - Browse repository at this point
Copy the full SHA b630acdView commit details -
threadpool: reduce the number of barrier required
New work is now indicated with an atomic counter that is incremented for each new graph that needs to be computed. This removes the need for extra barrier for clearing the "new_work" and removes the special case for trivial graphs.
Configuration menu - View commit details
-
Copy full SHA for 9d3e78c - Browse repository at this point
Copy the full SHA 9d3e78cView commit details -
threadpool: remove special-casing for disposable threadpools
With the efficient hybrid polling there is no need to make disposable pools any different. This simplifies the overall logic and reduces branching. Include n_threads in debug print for disposable threadpool. Declare pause and stop flags as atomic_bool This doesn't actually generate any memory barriers and simply informs the thread sanitizer that these flags can be written & read by different threads without locking.
Configuration menu - View commit details
-
Copy full SHA for 538bd9f - Browse repository at this point
Copy the full SHA 538bd9fView commit details -
threadpool: do not clear barrier counters between graphs computes (fi…
…xes race with small graphs) This fixes the race condition with very small graphs where the main thread happens to start a new graph while the workers are just about to exit from barriers.
Configuration menu - View commit details
-
Copy full SHA for db45b6d - Browse repository at this point
Copy the full SHA db45b6dView commit details -
threadpool: use relaxed order for chunk sync
Full memory barrier is an overkill for this since each thread works on different chunk
Configuration menu - View commit details
-
Copy full SHA for 307fece - Browse repository at this point
Copy the full SHA 307feceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 63a0dad - Browse repository at this point
Copy the full SHA 63a0dadView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2358bb3 - Browse repository at this point
Copy the full SHA 2358bb3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a4d715 - Browse repository at this point
Copy the full SHA 4a4d715View commit details -
threadpool: add support for ggml_threadpool_params_default/init
Also removes the need for explicit mask_specified param. all-zero cpumask means use default (usually inherited) cpu affinity mask.
Configuration menu - View commit details
-
Copy full SHA for c4452ed - Browse repository at this point
Copy the full SHA c4452edView commit details -
Configuration menu - View commit details
-
Copy full SHA for 31541d7 - Browse repository at this point
Copy the full SHA 31541d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4064860 - Browse repository at this point
Copy the full SHA 4064860View commit details -
Configuration menu - View commit details
-
Copy full SHA for f64c975 - Browse repository at this point
Copy the full SHA f64c975View commit details -
threadpool: enable --cpu-mask and other threadpool related options on…
…ly if threadpool is enabled
Configuration menu - View commit details
-
Copy full SHA for c506d7f - Browse repository at this point
Copy the full SHA c506d7fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8008463 - Browse repository at this point
Copy the full SHA 8008463View commit details -
threadpool: simplify threadpool init logic and fix main thread affini…
…ty application Most of the init code is now exactly the same between threadpool and openmp.
Configuration menu - View commit details
-
Copy full SHA for 49ac51f - Browse repository at this point
Copy the full SHA 49ac51fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 204377a - Browse repository at this point
Copy the full SHA 204377aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 93f170d - Browse repository at this point
Copy the full SHA 93f170dView commit details -
Configuration menu - View commit details
-
Copy full SHA for a7496bf - Browse repository at this point
Copy the full SHA a7496bfView commit details -
threadpool: avoid updating process priority on the platforms that do …
…not require it On Windows we need to change overall process priority class in order to set thread priorities, but on Linux, Mac, etc we do not need to touch the overall process settings.
Configuration menu - View commit details
-
Copy full SHA for 8186e96 - Browse repository at this point
Copy the full SHA 8186e96View commit details -
threadpool: update calling thread prio and affinity only at start/resume
This avoids extra syscalls for each graph_compute()
Configuration menu - View commit details
-
Copy full SHA for 658f16c - Browse repository at this point
Copy the full SHA 658f16cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8d5ab9a - Browse repository at this point
Copy the full SHA 8d5ab9aView commit details -
llama-bench: add support for cool off between tests --delay
This helps for long running tests on platforms that are thermally limited (phones, laptops, etc). --delay (disabled by default) introduces the sleep for N seconds before starting each test.
Configuration menu - View commit details
-
Copy full SHA for 3bcc4de - Browse repository at this point
Copy the full SHA 3bcc4deView commit details
Commits on Aug 28, 2024
-
threadpool: move process priority setting into the apps (bench and cli)
This avoids changing the overall process priority on Windows for the apps that use ggml/llama.cpp directy.
Configuration menu - View commit details
-
Copy full SHA for 5d4c0a1 - Browse repository at this point
Copy the full SHA 5d4c0a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for e3c2202 - Browse repository at this point
Copy the full SHA e3c2202View commit details -
threadpool: futher api cleanup and prep for future refactoring
All threadpool related functions and structs use ggml_threadpool prefix.
Configuration menu - View commit details
-
Copy full SHA for c6328bc - Browse repository at this point
Copy the full SHA c6328bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for bead7d4 - Browse repository at this point
Copy the full SHA bead7d4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8e8f8ce - Browse repository at this point
Copy the full SHA 8e8f8ceView commit details
Commits on Aug 29, 2024
-
Update examples/llama-bench/llama-bench.cpp
Co-authored-by: slaren <slarengh@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for c6c27b1 - Browse repository at this point
Copy the full SHA c6c27b1View commit details -
Configuration menu - View commit details
-
Copy full SHA for b97bd67 - Browse repository at this point
Copy the full SHA b97bd67View commit details -
Configuration menu - View commit details
-
Copy full SHA for cae35b9 - Browse repository at this point
Copy the full SHA cae35b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for c49d634 - Browse repository at this point
Copy the full SHA c49d634View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b5f7c2 - Browse repository at this point
Copy the full SHA 3b5f7c2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 52aa677 - Browse repository at this point
Copy the full SHA 52aa677View commit details