Skip to content

Wrong binding to hwthreads  #1247

@igor-ivanov

Description

@igor-ivanov

@rhc54 could you look at it.
I am observing this issue at least on 1.8.8 and trunk

I see something strange using --bind-to hwthread:
cpu_set 42,28,0,14 and 14,15,0,1 do the same???
Both report
Cpus_allowed_list: 0
Cpus_allowed_list: 14
Cpus_allowed_list: 28
Cpus_allowed_list: 42

  • Expected:
    Cpus_allowed_list: 14
    Cpus_allowed_list: 1
    Cpus_allowed_list: 15
    Cpus_allowed_list: 0
  • Existing configuration:
$numactl --hardware | grep cpus
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13                       28 29 30 31 32 33 34 35 36 37 38 39 40 41
node 1 cpus: 14 15 16 17 18 19 20 21 22 23 24 25 26 27 42 43 44 45 46 47 48 49 50 51 52 53 54 55

$hwloc-info
    Socket L#0 + L3 L#0 (35MB)
      L2 L#0 (256KB) + L1d L#0 (32KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#28)
      L2 L#1 (256KB) + L1d L#1 (32KB) + Core L#1
        PU L#2 (P#1)
        PU L#3 (P#29)
    Socket L#1 + L3 L#1 (35MB)
      L2 L#14 (256KB) + L1d L#14 (32KB) + Core L#14
        PU L#28 (P#14)
        PU L#29 (P#42)
      L2 L#15 (256KB) + L1d L#15 (32KB) + Core L#15
        PU L#30 (P#15)
        PU L#31 (P#43)
  • I checked to ways:
  1. using -mca hwloc variables:
$/usr/mpi/gcc/openmpi-1.8.8/bin/oshrun --report-bindings -n 4 --oversubscribe -mca hwloc_base_cpu_set 42,28,0,14 -mca hwloc_base_binding_policy hwthread -mca hwloc_base_use_hwthreads_as_cpus 1 bash -c "cat /proc/self/status |grep Cpus_allowed_list"
[clx-orion-001:13313] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
[clx-orion-001:13313] MCW rank 1 bound to socket 1[core 14[hwt 0]]: [../../../../../../../../../../../../../..][B./../../../../../../../../../../../../..]
Cpus_allowed_list:      0
[clx-orion-001:13313] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
Cpus_allowed_list:      14
[clx-orion-001:13313] MCW rank 3 bound to socket 1[core 14[hwt 1]]: [../../../../../../../../../../../../../..][.B/../../../../../../../../../../../../..]
Cpus_allowed_list:      28
Cpus_allowed_list:      42

$/usr/mpi/gcc/openmpi-1.8.8/bin/oshrun --report-bindings -n 4 --oversubscribe -mca hwloc_base_cpu_set 14,15,0,1 -mca hwloc_base_binding_policy hwthread -mca hwloc_base_use_hwthreads_as_cpus 1 bash -c "cat /proc/self/status |grep Cpus_allowed_list"
[clx-orion-001:13346] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
[clx-orion-001:13346] MCW rank 1 bound to socket 1[core 14[hwt 0]]: [../../../../../../../../../../../../../..][B./../../../../../../../../../../../../..]
Cpus_allowed_list:      0
[clx-orion-001:13346] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
Cpus_allowed_list:      14
[clx-orion-001:13346] MCW rank 3 bound to socket 1[core 14[hwt 1]]: [../../../../../../../../../../../../../..][.B/../../../../../../../../../../../../..]
Cpus_allowed_list:      28
Cpus_allowed_list:      42
  1. using options:
$/usr/mpi/gcc/openmpi-1.8.8/bin/oshrun --report-bindings -n 4 --oversubscribe --cpu-set 14,42,0,28 --bind-to hwthread bash -c "cat /proc/self/status |grep Cpus_allowed_list"
[clx-orion-001:39050] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
[clx-orion-001:39050] MCW rank 1 bound to socket 1[core 14[hwt 0]]: [../../../../../../../../../../../../../..][B./../../../../../../../../../../../../..]
Cpus_allowed_list:      0
[clx-orion-001:39050] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
Cpus_allowed_list:      14
[clx-orion-001:39050] MCW rank 3 bound to socket 1[core 14[hwt 1]]: [../../../../../../../../../../../../../..][.B/../../../../../../../../../../../../..]
Cpus_allowed_list:      28
Cpus_allowed_list:      42

$/usr/mpi/gcc/openmpi-1.8.8/bin/oshrun --report-bindings -n 4 --oversubscribe --cpu-set 14,15,0,1 --bind-to hwthread bash -c "cat /proc/self/status |grep Cpus_allowed_list"
[clx-orion-001:13495] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
[clx-orion-001:13495] MCW rank 1 bound to socket 1[core 14[hwt 0]]: [../../../../../../../../../../../../../..][B./../../../../../../../../../../../../..]
Cpus_allowed_list:      0
[clx-orion-001:13495] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
Cpus_allowed_list:      14
[clx-orion-001:13495] MCW rank 3 bound to socket 1[core 14[hwt 1]]: [../../../../../../../../../../../../../..][.B/../../../../../../../../../../../../..]
Cpus_allowed_list:      28
Cpus_allowed_list:      42

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions