-
Notifications
You must be signed in to change notification settings - Fork 927
Closed
Description
@rhc54 could you look at it.
I am observing this issue at least on 1.8.8 and trunk
I see something strange using --bind-to hwthread:
cpu_set 42,28,0,14 and 14,15,0,1 do the same???
Both report
Cpus_allowed_list: 0
Cpus_allowed_list: 14
Cpus_allowed_list: 28
Cpus_allowed_list: 42
- Expected:
Cpus_allowed_list: 14
Cpus_allowed_list: 1
Cpus_allowed_list: 15
Cpus_allowed_list: 0 - Existing configuration:
$numactl --hardware | grep cpus
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 28 29 30 31 32 33 34 35 36 37 38 39 40 41
node 1 cpus: 14 15 16 17 18 19 20 21 22 23 24 25 26 27 42 43 44 45 46 47 48 49 50 51 52 53 54 55
$hwloc-info
Socket L#0 + L3 L#0 (35MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#28)
L2 L#1 (256KB) + L1d L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#29)
Socket L#1 + L3 L#1 (35MB)
L2 L#14 (256KB) + L1d L#14 (32KB) + Core L#14
PU L#28 (P#14)
PU L#29 (P#42)
L2 L#15 (256KB) + L1d L#15 (32KB) + Core L#15
PU L#30 (P#15)
PU L#31 (P#43)
- I checked to ways:
- using -mca hwloc variables:
$/usr/mpi/gcc/openmpi-1.8.8/bin/oshrun --report-bindings -n 4 --oversubscribe -mca hwloc_base_cpu_set 42,28,0,14 -mca hwloc_base_binding_policy hwthread -mca hwloc_base_use_hwthreads_as_cpus 1 bash -c "cat /proc/self/status |grep Cpus_allowed_list"
[clx-orion-001:13313] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
[clx-orion-001:13313] MCW rank 1 bound to socket 1[core 14[hwt 0]]: [../../../../../../../../../../../../../..][B./../../../../../../../../../../../../..]
Cpus_allowed_list: 0
[clx-orion-001:13313] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
Cpus_allowed_list: 14
[clx-orion-001:13313] MCW rank 3 bound to socket 1[core 14[hwt 1]]: [../../../../../../../../../../../../../..][.B/../../../../../../../../../../../../..]
Cpus_allowed_list: 28
Cpus_allowed_list: 42
$/usr/mpi/gcc/openmpi-1.8.8/bin/oshrun --report-bindings -n 4 --oversubscribe -mca hwloc_base_cpu_set 14,15,0,1 -mca hwloc_base_binding_policy hwthread -mca hwloc_base_use_hwthreads_as_cpus 1 bash -c "cat /proc/self/status |grep Cpus_allowed_list"
[clx-orion-001:13346] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
[clx-orion-001:13346] MCW rank 1 bound to socket 1[core 14[hwt 0]]: [../../../../../../../../../../../../../..][B./../../../../../../../../../../../../..]
Cpus_allowed_list: 0
[clx-orion-001:13346] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
Cpus_allowed_list: 14
[clx-orion-001:13346] MCW rank 3 bound to socket 1[core 14[hwt 1]]: [../../../../../../../../../../../../../..][.B/../../../../../../../../../../../../..]
Cpus_allowed_list: 28
Cpus_allowed_list: 42
- using options:
$/usr/mpi/gcc/openmpi-1.8.8/bin/oshrun --report-bindings -n 4 --oversubscribe --cpu-set 14,42,0,28 --bind-to hwthread bash -c "cat /proc/self/status |grep Cpus_allowed_list"
[clx-orion-001:39050] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
[clx-orion-001:39050] MCW rank 1 bound to socket 1[core 14[hwt 0]]: [../../../../../../../../../../../../../..][B./../../../../../../../../../../../../..]
Cpus_allowed_list: 0
[clx-orion-001:39050] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
Cpus_allowed_list: 14
[clx-orion-001:39050] MCW rank 3 bound to socket 1[core 14[hwt 1]]: [../../../../../../../../../../../../../..][.B/../../../../../../../../../../../../..]
Cpus_allowed_list: 28
Cpus_allowed_list: 42
$/usr/mpi/gcc/openmpi-1.8.8/bin/oshrun --report-bindings -n 4 --oversubscribe --cpu-set 14,15,0,1 --bind-to hwthread bash -c "cat /proc/self/status |grep Cpus_allowed_list"
[clx-orion-001:13495] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
[clx-orion-001:13495] MCW rank 1 bound to socket 1[core 14[hwt 0]]: [../../../../../../../../../../../../../..][B./../../../../../../../../../../../../..]
Cpus_allowed_list: 0
[clx-orion-001:13495] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../../../../../..][../../../../../../../../../../../../../..]
Cpus_allowed_list: 14
[clx-orion-001:13495] MCW rank 3 bound to socket 1[core 14[hwt 1]]: [../../../../../../../../../../../../../..][.B/../../../../../../../../../../../../..]
Cpus_allowed_list: 28
Cpus_allowed_list: 42
Metadata
Metadata
Assignees
Labels
No labels