Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threadpool: take 2 #8672

Merged
merged 48 commits into from
Aug 29, 2024
Merged

Threadpool: take 2 #8672

merged 48 commits into from
Aug 29, 2024

Commits on Aug 27, 2024

  1. Introduce ggml_compute_threadpool

    - OpenMP functional: check
    - Vanilla ggml functional: Check
    - ggml w/threadpool functional: Check
    - OpenMP no regression: No glaring problems
    - Vanilla ggml no regression: No glaring problems
    - ggml w/threadpool no regression: No glaring problems
    fmz authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    130adf8 View commit details
    Browse the repository at this point in the history
  2. Minor fixes

    fmz authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    a0aae52 View commit details
    Browse the repository at this point in the history
  3. fixed use after release bug

    fmz authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    d5c9c14 View commit details
    Browse the repository at this point in the history
  4. fixed a harmless race condition

    fmz authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    82224f8 View commit details
    Browse the repository at this point in the history
  5. Fix Android bulid issue

    fmz authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    817eaf0 View commit details
    Browse the repository at this point in the history
  6. fix more race conditions

    fmz authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    5763732 View commit details
    Browse the repository at this point in the history
  7. fix deadlock for cases where cgraph.n_nodes == 1

    and fix --poll case
    fmz authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    3008b31 View commit details
    Browse the repository at this point in the history
  8. threadpool: use cpu_get_num_math to set the default number of threadp…

    …ool threads
    
    This way we avoid using E-Cores and Hyperthreaded siblings.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    96d6603 View commit details
    Browse the repository at this point in the history
  9. bench: create fresh threadpool for each test

    For benchmarking it's better to start a fresh pool for each test with the exact number of threads
    needed for that test. Having larger pools is suboptimal (causes more load, etc).
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    2953441 View commit details
    Browse the repository at this point in the history
  10. atomics: always use stdatomics with clang and use relaxed memory orde…

    …r when polling in ggml_barrier
    
    This also removes sched_yield() calls from ggml_barrier() to match OpenMP behavior.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    6fcc780 View commit details
    Browse the repository at this point in the history
  11. threadpool: make polling the default to match openmp behavior

    All command line args now allow for setting poll to 0 (false).
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    3b62f7c View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    dfa6377 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    2e18f0d View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    48aa8ee View commit details
    Browse the repository at this point in the history
  15. threadpool: reduce pause/resume/wakeup overhead in common cases

    We now start threadpool in paused state only if we have two.
    The resume is now implicit (ie new work) which allows for reduced locking and context-switch overhead.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    494e27c View commit details
    Browse the repository at this point in the history
  16. threadpool: add support for hybrid polling

    poll params (--poll, ...) now specify "polling level", i.e. how aggresively we poll before waiting on cond.var.
    poll=0 means no polling, 1 means poll for 128K rounds then wait, 2 for 256K rounds, ...
    
    The default value of 50 (ie 50x128K rounds) seems like a decent default across modern platforms.
    We can tune this further as things evolve.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    b630acd View commit details
    Browse the repository at this point in the history
  17. threadpool: reduce the number of barrier required

    New work is now indicated with an atomic counter that is incremented for
    each new graph that needs to be computed.
    This removes the need for extra barrier for clearing the "new_work" and
    removes the special case for trivial graphs.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    9d3e78c View commit details
    Browse the repository at this point in the history
  18. threadpool: remove special-casing for disposable threadpools

    With the efficient hybrid polling there is no need to make disposable pools any different.
    This simplifies the overall logic and reduces branching.
    
    Include n_threads in debug print for disposable threadpool.
    
    Declare pause and stop flags as atomic_bool
    This doesn't actually generate any memory barriers and simply informs
    the thread sanitizer that these flags can be written & read by different
    threads without locking.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    538bd9f View commit details
    Browse the repository at this point in the history
  19. threadpool: do not clear barrier counters between graphs computes (fi…

    …xes race with small graphs)
    
    This fixes the race condition with very small graphs where the main thread happens to
    start a new graph while the workers are just about to exit from barriers.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    db45b6d View commit details
    Browse the repository at this point in the history
  20. threadpool: use relaxed order for chunk sync

    Full memory barrier is an overkill for this since each thread works on different chunk
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    307fece View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    63a0dad View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    2358bb3 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    4a4d715 View commit details
    Browse the repository at this point in the history
  24. threadpool: add support for ggml_threadpool_params_default/init

    Also removes the need for explicit mask_specified param.
    all-zero cpumask means use default (usually inherited) cpu affinity mask.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    c4452ed View commit details
    Browse the repository at this point in the history
  25. threadpool: move typedef into ggml.h

    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    31541d7 View commit details
    Browse the repository at this point in the history
  26. threadpool: fix apply_priority() function name

    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    4064860 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    f64c975 View commit details
    Browse the repository at this point in the history
  28. threadpool: enable --cpu-mask and other threadpool related options on…

    …ly if threadpool is enabled
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    c506d7f View commit details
    Browse the repository at this point in the history
  29. threadpool: replace checks for compute_thread ret code with proper st…

    …atus check
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    8008463 View commit details
    Browse the repository at this point in the history
  30. threadpool: simplify threadpool init logic and fix main thread affini…

    …ty application
    
    Most of the init code is now exactly the same between threadpool and openmp.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    49ac51f View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    204377a View commit details
    Browse the repository at this point in the history
  32. threadpool: enable openmp by default for now

    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    93f170d View commit details
    Browse the repository at this point in the history
  33. Configuration menu
    Copy the full SHA
    a7496bf View commit details
    Browse the repository at this point in the history
  34. threadpool: avoid updating process priority on the platforms that do …

    …not require it
    
    On Windows we need to change overall process priority class in order to set thread priorities,
    but on Linux, Mac, etc we do not need to touch the overall process settings.
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    8186e96 View commit details
    Browse the repository at this point in the history
  35. threadpool: update calling thread prio and affinity only at start/resume

    This avoids extra syscalls for each graph_compute()
    max-krasnyansky authored and fmz committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    658f16c View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    8d5ab9a View commit details
    Browse the repository at this point in the history
  37. llama-bench: add support for cool off between tests --delay

    This helps for long running tests on platforms that are thermally limited (phones, laptops, etc).
    --delay (disabled by default) introduces the sleep for N seconds before starting each test.
    max-krasnyansky committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    3bcc4de View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2024

  1. threadpool: move process priority setting into the apps (bench and cli)

    This avoids changing the overall process priority on Windows for the apps
    that use ggml/llama.cpp directy.
    max-krasnyansky committed Aug 28, 2024
    Configuration menu
    Copy the full SHA
    5d4c0a1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e3c2202 View commit details
    Browse the repository at this point in the history
  3. threadpool: futher api cleanup and prep for future refactoring

    All threadpool related functions and structs use ggml_threadpool prefix.
    max-krasnyansky committed Aug 28, 2024
    Configuration menu
    Copy the full SHA
    c6328bc View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    bead7d4 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8e8f8ce View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2024

  1. Update examples/llama-bench/llama-bench.cpp

    Co-authored-by: slaren <slarengh@gmail.com>
    max-krasnyansky and slaren committed Aug 29, 2024
    Configuration menu
    Copy the full SHA
    c6c27b1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b97bd67 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    cae35b9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c49d634 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    3b5f7c2 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    52aa677 View commit details
    Browse the repository at this point in the history