Disable NUMA-specific chunking for high-core-count HPC systems #16882

rageshh-fj · 2025-10-31T04:45:46Z

This PR disables the NUMA-specific chunking logic in ggml when running on large-core systems.

Background: The previous implementation applied special chunking for NUMA machines:

if (nchunk0 * nchunk1 < nth * 4 || ggml_is_numa()) {
    nchunk0 = nr0 > nr1 ? nth : 1;
    nchunk1 = nr0 > nr1 ? 1 : nth;
}

The intention was to optimize parallelization across threads by re-chunking work for NUMA nodes, based on findings in PR #6915

However, empirical results on high-core-count NUMA machines indicate that this optimization can hurt performance, especially when the number of threads increases.

Changes Proposed

Comment out the NUMA-specific chunking logic in ggml to allow uniform chunking across threads.
Preserve the rest of the parallelization logic.

Threads	Runs	Baseline Build	NUMA Chunking Disabled	Improvement (%)
24	50	41.72	41.77	-0.12
48	50	22.03	22.38	-1.59
64	50	17.61	17.48	0.74
96	50	15.25	15.49	1.57
129	50	20.38	16.34	19.82
160	50	26.61	20.08	24.5
170	50	38.47	17.97	53.29
172	50	64.91	18.79	71.05
192	50	44.46	22.39	49.64

Observations:

At lower thread counts (≤64), performance is largely unchanged, with minor fluctuations (±1%).
Minor improvements start appearing around thread 64.
Significant, consistent performance improvements start at thread 129, with speedups up to 71% on the highest thread counts.

This PR proposes an improvement intended to spark discussion toward a more holistic, hardware-agnostic solution. It also highlights that the current algorithm may not suit all hardware architectures.

cc: @shivammonaka

ggerganov · 2025-11-01T08:23:08Z

Minor improvements start appearing around thread 64.

I don't see why this change could lead to any difference in the range of [64, 128] threads. How do you explain this observation?

Disable NUMA-specific chunking for high-core-count HPC systems

21b530e

rageshh-fj requested review from ggerganov and slaren as code owners October 31, 2025 04:45

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disable NUMA-specific chunking for high-core-count HPC systems #16882

Disable NUMA-specific chunking for high-core-count HPC systems #16882

rageshh-fj commented Oct 31, 2025 •

edited

Loading

Uh oh!

ggerganov commented Nov 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Disable NUMA-specific chunking for high-core-count HPC systems #16882

Are you sure you want to change the base?

Disable NUMA-specific chunking for high-core-count HPC systems #16882

Conversation

rageshh-fj commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Nov 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rageshh-fj commented Oct 31, 2025 •

edited

Loading