Skip to content

Conversation

@rageshh-fj
Copy link

@rageshh-fj rageshh-fj commented Oct 31, 2025

This PR disables the NUMA-specific chunking logic in ggml when running on large-core systems.

Background: The previous implementation applied special chunking for NUMA machines:

if (nchunk0 * nchunk1 < nth * 4 || ggml_is_numa()) {
    nchunk0 = nr0 > nr1 ? nth : 1;
    nchunk1 = nr0 > nr1 ? 1 : nth;
}

The intention was to optimize parallelization across threads by re-chunking work for NUMA nodes, based on findings in PR #6915

However, empirical results on high-core-count NUMA machines indicate that this optimization can hurt performance, especially when the number of threads increases.

Changes Proposed

  • Comment out the NUMA-specific chunking logic in ggml to allow uniform chunking across threads.
  • Preserve the rest of the parallelization logic.
Threads Runs Baseline Build NUMA Chunking Disabled Improvement (%)
24 50 41.72 41.77 -0.12
48 50 22.03 22.38 -1.59
64 50 17.61 17.48 0.74
96 50 15.25 15.49 1.57
129 50 20.38 16.34 19.82
160 50 26.61 20.08 24.5
170 50 38.47 17.97 53.29
172 50 64.91 18.79 71.05
192 50 44.46 22.39 49.64

Observations:

  • At lower thread counts (≤64), performance is largely unchanged, with minor fluctuations (±1%).
  • Minor improvements start appearing around thread 64.
  • Significant, consistent performance improvements start at thread 129, with speedups up to 71% on the highest thread counts.

This PR proposes an improvement intended to spark discussion toward a more holistic, hardware-agnostic solution. It also highlights that the current algorithm may not suit all hardware architectures.

cc: @shivammonaka

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 31, 2025
@ggerganov
Copy link
Member

Minor improvements start appearing around thread 64.

I don't see why this change could lead to any difference in the range of [64, 128] threads. How do you explain this observation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants