Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Numa Node in topology #455

Closed
fmdewal opened this issue Mar 2, 2021 · 1 comment
Closed

Missing Numa Node in topology #455

fmdewal opened this issue Mar 2, 2021 · 1 comment

Comments

@fmdewal
Copy link

fmdewal commented Mar 2, 2021

What version of hwloc are you using?

$ hwloc-info --version
hwloc-info 1.11.8

  • Run which lstopo to ensure that you are running the desired hwloc installation
    $ which lstopo-no-graphics
    /usr/bin/lstopo-no-graphics

  • Run lstopo --version to find their hwloc version
    $ lstopo-no-graphics --version
    lstopo-no-graphics 1.11.8

    • If hwloc comes from a RPM package (RHEL, CentOS, Fedora, etc.), run rpm -qa '*hwloc*'
    • If hwloc comes from a DEB package (Debian, Ubuntu, Mint, etc.), run dpkg -l '*hwloc*'
    • If hwloc was built from a Git clone, report the Git commit hash from git show
  • If the hwloc error occured inside a non-hwloc process (e.g., MPI, SLURM, etc.), report the version of that software. Also try things like ldd on the program to find out which hwloc installation that software is using.
    $ rpm -qa 'hwloc'
    hwloc-libs-1.11.8-4.el7.i686
    hwloc-1.11.8-4.el7.x86_64
    hwloc-libs-1.11.8-4.el7.x86_64
    hwloc-devel-1.11.8-4.el7.i686
    hwloc-devel-1.11.8-4.el7.x86_64

Which operating system and hardware are you running on?

  • On Unix-like systems, run uname -a so that we know which operating system, distribution, and kernel version you are using.
    uname -a
    Linux xxx 3.10.0-1160.11.1.el7.x86_64 Silence warning on Solaris. #1 SMP Fri Dec 18 16:34:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Post the output of lstopo - if it works
    $ lstopo-no-graphics
    Machine (31GB)
    Package L#0 + L3 L#0 (30MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
    HostBridge L#0
    PCI 8086:7010
    Block(Removable Media Device) L#0 "sr0"
    PCI 1013:00b8
    GPU L#1 "card0"
    GPU L#2 "controlD64"

Details of the problem

When using hwloc to discover the numa topology, we observe that the numa node is missing in the returned topology object after load is initiated.
The following topology flags are set before load is invoked
topology flags: 0x403d80

Additional information

  • When investigating using dmesg, there is one numa node information available
    $ dmesg | grep -i numa
    [ 0.000000] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0x7fffffff] -> [mem 0x00000000-0x7fffffff]
    [ 0.000000] NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 0x80000000-0xffefffff] -> [mem 0x00000000-0xffefffff]
    [ 0.000000] NUMA: Node 0 [mem 0x00000000-0xffefffff] + [mem 0x100000000-0x81f7fffff] -> [mem 0x00000000-0x81f7fffff]

  • Even when displaying the whole system topology, the numa node is not displayed. Same observations are made when using hwloc as a C library
    $ hwloc-ls --whole-system
    Machine (31GB)
    Package L#0 + L3 L#0 (30MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
    HostBridge L#0
    PCI 8086:7010
    Block(Removable Media Device) L#0 "sr0"
    PCI 1013:00b8
    GPU L#1 "card0"
    GPU L#2 "controlD64"

Query

  • Is there a specific flag that we can specify to retrieve the numa node information?
  • In case we create a logical/dummy numa node layer in our topology, is there a preferred way of doing so? More specifically, could such a logical layer be created between the 'Machine and Package' layers?
@bgoglin
Copy link
Contributor

bgoglin commented Mar 2, 2021

Just switch to hwloc 2. There's always a NUMA node even on non-NUMA machines like yours.

hwloc 1.11.8 is very old, and there won't be any new 1.11.x release anymore. And all Linux distributions will use hwloc 2.x in the next major release if not done already.

Note that hwloc 2 has a slightly different API, especially regarding NUMA. You may want to read https://www.open-mpi.org/projects/hwloc/doc/v2.4.1/a00365.php

If you need help, feel free to ask here and/or on the hwloc-users mailing list.

@bgoglin bgoglin closed this as completed Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants