Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot bind to several memories #601

Closed
antoine-morvan opened this issue Jun 29, 2023 · 6 comments
Closed

Cannot bind to several memories #601

antoine-morvan opened this issue Jun 29, 2023 · 6 comments

Comments

@antoine-morvan
Copy link

What version of hwloc are you using?

hwloc & lstopo (version 2.9.2)

  • lstopo args:
    • LSTOPOARGS="--merge --no-legend --no-io --ignore pci --ignore net --of svg"
  • observe bindings
    • hwloc-bind $BINDING_ARGS lstopo-no-graphics $LSTOPOARGS --pid 0

Which operating system and hardware are you running on?

Linux 4.18.0-372.26.1.el8_6.x86_64 #1 SMP Sat Aug 27 02:44:20 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

Intel Xeon 8358 (Ice Lake). Topology:
icelake

Details of the problem

I am trying to bind processes so they use dedicated set of memory banks (in this case, 1 and 3).
My wish would be to use something like this:

hwloc-bind \
    --cpubind all \
    --membind numa:1 numa:3 \
	$CMD
	
# or
hwloc-bind \
    --cpubind all \
    --membind numa:odd \
	$CMD

However such commands will result in the first memory place to be used is used only (e.g., numa:1 only).
For instance :

hwloc-bind \
    --cpubind all \
    --membind numa:odd \
    lstopo-no-graphics $LSTOPOARGS --pid 0 > file.svg

Produces this output:

icelake_numa_odd

Whereas I was expecting all the odd numa memories to be used.

Just to make sure this is not a system limitation, I ran the same lstopo with numactl binding:

# note: numactl physical and logical numbering is the same in this example
numactl --membind 1,3 \
    lstopo-no-graphics $LSTOPOARGS

which gives me:

icelake_numactl

Where we can see that numactl was able to memory bind to the 2 memories, as I expect.

Did I miss some argument in the hwloc-bind call ?

Best.

@bgoglin
Copy link
Contributor

bgoglin commented Jun 29, 2023

Hello. Fasten your seat belt, this is a bit complicated.

There are two main ways to bind memory on Linux, MPOL_BIND and MPOL_PREFERRED (there's also INTERLEAVE but it doesn't matter here). numactl uses BIND by default (if the nodes you give are full, allocation fails). hwloc uses PREFERRED by default (if the nodes you give are full, allocation falls back to other nodes). You may pass the STRICT flag (or --strict on the command-line) to switch hwloc to BIND instead of PREFERRED.

Strictly speaking, the default hwloc isn't wrong: it's allocating memory inside the mask you've given, but the capacity is indeed more limited than expected, but it has a fallback if the capacity is exceeded.

The reason PREFERRED shows a single node is that the old implementation in Linux basically ignores all nodes but the first one in the mask you give. There's a new implementation called MPOL_PREFERRED_MANY in kernel 5.15 which would likely fix your report, but I guess it's not available in your redhat kernel. If you try "numactl -p 1,3" instead of "numactl --membind 1,3", this tells numactl to use PREFERRED instead of BIND, and I guess it will fail because you're giving multiple nodes and the kernel doesn't support it.

@antoine-morvan
Copy link
Author

antoine-morvan commented Jun 30, 2023

Hello, thanks for the details.

The --strict flag indeed fixes the issue described in the first post.

Those binding modes (bind, preferred, interleave) are detailed in some documentation I have read recently (https://www.intel.com/content/www/us/en/content-details/769060/intel-xeon-cpu-max-series-configuration-and-tuning-guide.html?DocID=769060 page 27, section 6.2.1).

This document mentions 4 modes for binding to memories using numactl:

  • numactl --membind (one or many)
  • numactl --preferred (one)
  • numactl --preferred-many (one or many)
  • numactl --interleave (one or many)

I was expecting hwloc-bind to expose such control via --mempolicy. Why the need of this --strict flag instead of exposing --mempolicy=preferred ?

@bgoglin
Copy link
Contributor

bgoglin commented Jun 30, 2023

Because hwloc is not Linux specific :/ We try to keep the API portable (and simple). Other operating systems expose different policies, finding some sort of common denominator was very difficult.
That said, I could try to better document things and/or add a Linux specific option such as hwloc-bind --linux-mempolicy=preferred/bind/interleave.

@antoine-morvan
Copy link
Author

antoine-morvan commented Jun 30, 2023

I see, thanks again for your time. I definetly agree the doc would greatly benefit from such additions 👍

My last 2 cents: if you are going to expose a linux specific flag (e.g., --linux-mempolicy, that is breaking the 'common denominator'), why not expose linux specific options in the existing flag (--mempolicy) ? :)

@bgoglin
Copy link
Contributor

bgoglin commented Jun 30, 2023

--mempolicy current uses the hwloc terminology (bind/interleave/firsttouch/nextttouch). I'd need a way to understand if people are asking for hwloc's "bind" policy or Linux's "bind" policy. Could be --mempolicy linux-bind or something like this.

By the way, if there are some places in the doc that you already found unclear, please me know. Usually these kinds of clarifications go in the hwloc-bind manpage and in the introduction of the "Memory binding" section in hwloc.h. I may add something in the doxygen text too ("CPU and Memory Binding Overview").

bgoglin added a commit that referenced this issue Jul 3, 2023
Refs #601.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
bgoglin added a commit that referenced this issue Jul 3, 2023
- subdivide in sections
- add an introduction
- talk about portability and policies
- more cross-references

Refs #601

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
bgoglin added a commit that referenced this issue Jul 3, 2023
Refs #601.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 639d09e)
bgoglin added a commit that referenced this issue Jul 3, 2023
- subdivide in sections
- add an introduction
- talk about portability and policies
- more cross-references

Refs #601

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit dfca160)
bgoglin added a commit that referenced this issue Jul 4, 2023
Refs #601.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 639d09e)
bgoglin added a commit that referenced this issue Jul 4, 2023
- subdivide in sections
- add an introduction
- talk about portability and policies
- more cross-references

Refs #601

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit dfca160)
@bgoglin
Copy link
Contributor

bgoglin commented Jul 4, 2023

I pushed several updates to the doc in master, v2.x and v2.9, hopefully that will help avoid the confusion between policies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants