Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bold warnings when using preferred many memory binding scheme on a kernel that does not support it #668

Closed
antoine-morvan opened this issue Jun 13, 2024 · 9 comments

Comments

@antoine-morvan
Copy link

What version of hwloc are you using?

Several versions from 2.2.0 up to 2.10.0

Which operating system and hardware are you running on?

Several OS (ubuntu 22, rhel 8) on several machines (Intel, AMD, AWS Graviton, Nvidia Grace, etc.)

Details of the problem

Hello,

hwloc uses the "preferred many" memory binding scheme by default when specifying several memories, as stated here #236

However this memory binding scheme is available from Linux kernel 5.15, and many systems are using previous versions of the Linux kernel. Even Redhat 9 is shipped with the linux kernel 5.14.

I am a hwloc evangelist, and managed to convince many people to move to this tool. However too many are actually coming back to me with perf degradation using hwloc-bind --membind numa:0-3 $EXE. Indeed, the kernel picks the first memory, and silently ignores the remaining ones. I also countered the issue some time ago (#601).

I think hwloc could detect such situation where the user is using preferred many on a kernel that does not support it, then print a bold warning recommending to upgrade the kernel, or use --strict instead, with some link to some documentation (this ticket for instance 😇 ), or anything else to help the user.

Best.

@bgoglin
Copy link
Contributor

bgoglin commented Jun 13, 2024

Hello. We already have a debug message:

hwloc_debug("MPOL_PREFERRED_MANY not supported, reverting to MPOL_PREFERRED (with a single node)\n");

I guess I can make it a normal warning at least in hwloc-bind. And make it easier to understand.

However, if they are using the C API, we have many users who don't want underlying libraries to print warnings/errors.

@antoine-morvan
Copy link
Author

100% agree :)

@bgoglin
Copy link
Contributor

bgoglin commented Jun 14, 2024

Quick question while working on the writing of the warning: if these people are coming back to you with performance degradation, it means that they are filling multiple entire NUMA nodes with huge allocations? If so, PREFERRED_MANY fills all the given nodes before moving to anything else, while PREFERRED uses anything else as soon as the first node is full. Correct?

@bgoglin
Copy link
Contributor

bgoglin commented Jun 14, 2024

Here's a wording proposal:

[hwloc/membind] MPOL_PREFERRED_MANY not supported by the kernel.
If *all* given nodes must be used, use strict binding or the interleave policy.
Otherwise the old MPOL_PREFERRED will only use the first given node.

This non-critical error message is not shown by default in the library, but lstopo and now hwloc-bind increase verbosity to show it by default.
The message is only shown once per process. And only if preferred with multiple nodes.

@antoine-morvan
Copy link
Author

antoine-morvan commented Jun 14, 2024

The application is not filling the NUMA node. Performance degradation comes from redirecting all the allocation, hence all transfers, onto 1 NUMA only, and hence 1 chanel, instead of multiple. This has 3 effects :

  1. divide available bandwidth by the number of chanels that are not used (until the numa:0 is filled, if it ever is depending on mem usage)
  2. increase latency for compute resources that are far from the numa:0 and
  3. increase contention on that only chanel

This can easily be observed with a stream. I ran a 20GB stream on this machine (showing only 1 out of 2 sockets), that easily fits in any of the NUMA :

image

⚠️ this is on pre 5.15 kernel

## hwloc-bind numa:0-3 --membind numa:0-3 --strict
Copy:          237975.1     0.060665     0.060159     0.060756
Scale:         238078.0     0.060447     0.060133     0.060513
Add:           245212.9     0.087834     0.087575     0.087992
Triad:         244882.9     0.088164     0.087693     0.088377

## hwloc-bind numa:0-3 --membind numa:0 --strict
Copy:           58934.9     0.243093     0.242918     0.247378
Scale:          58951.4     0.243054     0.242850     0.247871
Add:            60726.9     0.353891     0.353624     0.359667
Triad:          60697.1     0.354062     0.353798     0.355221

## hwloc-bind numa:0-3 --membind numa:0-3
## Preferred many is not supported; this is equivalent to hwloc-bind numa:0-3 --membind numa:0 as 1-3 are silently ignored
Copy:           58937.5     0.243038     0.242907     0.244884
Scale:          58956.4     0.242922     0.242829     0.243485
Add:            60725.3     0.353926     0.353634     0.360854
Triad:          60690.7     0.354357     0.353835     0.361484

Binding to 4 numas gives around 240GB/s, whereas binding to numa 0 only gives around 60GB/s (little less than 1/4 as expected due to contention & increased distance for numa 1-3). Since 20GB fit in one numa, using preferred many leads to the same result as numa:0 --strict .

Stream is pathological. Depending on the application (and its tendency to use memory) the impact can be different.

@bgoglin
Copy link
Contributor

bgoglin commented Jun 14, 2024

Hmmm, so preferred_many does some sort of interleaving? I thought it would fill the first node only, then the second one, then 3rd, etc.

@antoine-morvan
Copy link
Author

antoine-morvan commented Jun 14, 2024

I am with kernel pre 5.15, so this is silently ignoring preferred many in the last example.

A typical example of why people are coming back to me with perf degradation, and where I want the warning message to pop :)

bgoglin added a commit that referenced this issue Jun 14, 2024
…rted

Old kernels such as 5.14 in RHEL9 don't support MPOL_PREFERRED_MANY,
we fallback to MPOL_PREFERRED which uses only the first given node,
leading to less performance.
Change the warning into a non-critical error and clarify it.

Refs #668

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
bgoglin added a commit that referenced this issue Jun 14, 2024
So that the MPOL_PREFERRED warning (from commit
04e76a205d6df7ed44be0e5538f7e554c835b6f7) is shown

Refs #668

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
bgoglin added a commit that referenced this issue Jun 14, 2024
…rted

Old kernels such as 5.14 in RHEL9 don't support MPOL_PREFERRED_MANY,
we fallback to MPOL_PREFERRED which uses only the first given node,
leading to less performance.
Change the warning into a non-critical error and clarify it.

Refs #668

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 05a1435)
bgoglin added a commit that referenced this issue Jun 14, 2024
So that the MPOL_PREFERRED warning (from commit
04e76a205d6df7ed44be0e5538f7e554c835b6f7) is shown

Refs #668

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit da53ad2)
bgoglin added a commit that referenced this issue Jun 14, 2024
…rted

Old kernels such as 5.14 in RHEL9 don't support MPOL_PREFERRED_MANY,
we fallback to MPOL_PREFERRED which uses only the first given node,
leading to less performance.
Change the warning into a non-critical error and clarify it.

Refs #668

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 05a1435)
bgoglin added a commit that referenced this issue Jun 14, 2024
So that the MPOL_PREFERRED warning (from commit
04e76a205d6df7ed44be0e5538f7e554c835b6f7) is shown

Refs #668

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit da53ad2)
@bgoglin
Copy link
Contributor

bgoglin commented Jun 14, 2024

I just pushed the changes. They will be in 2.11 and maybe in a 2.10.1 (but not sure yet if I'll release that one). The plan is to release rc1 next week.
If you want to test it, there's a tarball at https://ci.inria.fr/hwloc/job/basic/job/v2.10/

@bgoglin bgoglin closed this as completed Jun 14, 2024
@bgoglin
Copy link
Contributor

bgoglin commented Jun 17, 2024

I am posting 2.11rc1 right now with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants