Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hwloc 1.11.7 issue on win2012 after enabling processor group #260

Closed
chenlcl opened this issue Aug 24, 2017 · 18 comments
Closed

hwloc 1.11.7 issue on win2012 after enabling processor group #260

chenlcl opened this issue Aug 24, 2017 · 18 comments
Labels

Comments

@chenlcl
Copy link

chenlcl commented Aug 24, 2017

Here is the detail information of this issue.

Test environment

Testing machine - Windows 2012 Standard version
hwloc - 1.11.7 win64

Steps

  1. Before modify the processor group, here is the hwloc-ls output

C:\hwloc-win64-build-1.11.7\bin>hwloc-ls
Machine (19GB total)
NUMANode L#0 (P#0 9848MB) + Package L#0 + L3 L#0 (12MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
NUMANode L#1 (P#1 9861MB) + Package L#1 + L3 L#1 (12MB)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)
L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10)
L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11)

  1. Modify the processor group using following commands

bcdedit.exe /set groupsize 4
bcdedit.exe /set groupaware on

  1. Reboot and check processor group setting

C:\hwloc-win64-build-1.11.7\bin>coreinfo -g

Coreinfo v3.31 - Dump information on system CPU and memory topology
Copyright (C) 2008-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

Logical Processor to Group Map:
Group 0:




Group 1:



Group 2:



  1. Run hwloc-ls and hwloc-info got following error.

C:\hwloc-win64-build-1.11.7\bin>hwloc-ls


  • hwloc 1.11.7 has encountered what looks like an error from the operating system.
  • Package (cpuset 0x00000001,,0x00000008) intersects with L3 (cpuset 0x0000000f) without inclusion!
  • Error occurred in topology.c line 1082
  • The following FAQ entry in the hwloc documentation may help:
  • What should I do when hwloc reports "operating system" warnings?
  • Otherwise please report this error message to the hwloc user's mailing list,
  • along with any relevant topology information from your platform.

Machine (19GB total)
NUMANode L#0 (P#0 9730MB) + L3 L#0 (12MB)
Package L#0
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
Package L#1 + L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
Group0 L#0
NUMANode L#1 (P#2) + L3 L#1 (12MB)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#64)
Package L#2 + L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#65)
NUMANode L#2 (P#3) + L3 L#2 (12MB)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#66)
Package L#3 + L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#67)
NUMANode L#3 (P#1 10175MB) + L3 L#3 (12MB)
Package L#4
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#128)
L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#129)
Package L#5 + L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#130)
L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#131)

C:\hwloc-win64-build-1.11.7\bin>hwloc-info


  • hwloc 1.11.7 has encountered what looks like an error from the operating system.
  • Package (cpuset 0x00000001,,0x00000008) intersects with L3 (cpuset 0x0000000f) without inclusion!
  • Error occurred in topology.c line 1082
  • The following FAQ entry in the hwloc documentation may help:
  • What should I do when hwloc reports "operating system" warnings?
  • Otherwise please report this error message to the hwloc user's mailing list,
  • along with any relevant topology information from your platform.

depth 0: 1 Machine (type #1)
depth 1: 1 Group0 (type #7)
depth 2: 4 NUMANode (type #2)
depth 3: 4 L3Cache (type #4)
depth 4: 6 Package (type #3)
depth 5: 12 L2Cache (type #4)
depth 6: 12 L1dCache (type #4)
depth 7: 12 L1iCache (type #4)
depth 8: 12 Core (type #5)
depth 9: 12 PU (type #6)

  1. hwloc-calc also has issue and the CPU mask looks incorrect.

C:\hwloc-win64-build-1.11.7\bin>hwloc-calc.exe node:1.core:0


  • hwloc 1.11.7 has encountered what looks like an error from the operating system.
  • Package (cpuset 0x00000001,,0x00000008) intersects with L3 (cpuset 0x0000000f) without inclusion!
  • Error occurred in topology.c line 1082
  • The following FAQ entry in the hwloc documentation may help:
  • What should I do when hwloc reports "operating system" warnings?
  • Otherwise please report this error message to the hwloc user's mailing list,
  • along with any relevant topology information from your platform.

0x00000001,,0x0

C:\hwloc-win64-build-1.11.7\bin>hwloc-calc.exe node:1.core:1


  • hwloc 1.11.7 has encountered what looks like an error from the operating system.
  • Package (cpuset 0x00000001,,0x00000008) intersects with L3 (cpuset 0x0000000f) without inclusion!
  • Error occurred in topology.c line 1082
  • The following FAQ entry in the hwloc documentation may help:
  • What should I do when hwloc reports "operating system" warnings?
  • Otherwise please report this error message to the hwloc user's mailing list,
  • along with any relevant topology information from your platform.

0x00000002,,0x0

@bgoglin
Copy link
Contributor

bgoglin commented Aug 24, 2017

Hello

It looks your coreinfo output didn't get uploaded correctly, so I have to guess. Dividing 6-core processors in groups of 4 might be a bad idea if windows puts cores of different processors in a same group. That might be what happens here. Try with 3 instead of 4 to see what happens first.

If windows does strange things like above, we may have a just ignore processor groups in such cases. Not sure how to detect this...

@chenlcl
Copy link
Author

chenlcl commented Aug 24, 2017

Hello,

Sorry, here is the coreinfo -g output. I have tested many group value - 2, 3, 4 and 6. If I set group value to 2 and 4, hwloc has above issue. For group value 3 and 6, after set and reboot it has no effect on Windows system and I cannot find any error in events. Our customer also tested this on their environment with processor group value 8 on Windows 10 and it has same issue. It seems processor group is used to support more than 64 processors on Windows platform and I think it's important for MPI environment, although on the host which has less than 64 processor I agree that no need to use processor group.

C:\hwloc-win64-build-1.11.7\bin>coreinfo -g

Coreinfo v3.31 - Dump information on system CPU and memory topology
Copyright (C) 2008-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

Logical Processor to Group Map:
Group 0:
****
----
----
Group 1:
----
****
----
Group 2:
----
----
****

@bgoglin
Copy link
Contributor

bgoglin commented Aug 24, 2017

To be clear, processor groups are supported by hwloc already. Microsoft verified on a 160-core machine. If I remember correctly, we had 2 groups with 60 PUs each (3 sockets with 10 HT cores each), and a third group with 40 PUs (last 2 sockets).

Although I use bcdedit to test our processor group support on my small windows machine, it should be used carefully because windows creates virtual groups that cross existing resource boundaries (group 1 is across 2 packages in your examples). Not sure what happens with value 2, that one should work....

Anyway, have you seen an issue on a real machine with processor groups that were not created by bcdedit?

@chenlcl
Copy link
Author

chenlcl commented Aug 24, 2017

Thanks for your information.

I think my answer is no. We don't have such environment. This issue was found by a customer and his environment has been configured by bcdedit too. For my understanding we always need to use bcdedit to set group size even on the host which has more than 64 processors unless we want to use default group size.

@bgoglin
Copy link
Contributor

bgoglin commented Aug 25, 2017

OK. If that customers used bcdedit on a large machine where processor groups are required anyway, that becomes interesting. Can you get information about what groupsize was used on what kind of machine (processor model + how many of them) ?
Also, "coreinfo -cgns" might give more information about how cores where distributed among groups.

@stangraves
Copy link

This is the best update we have from the customer that describes their initial environment that shows the problem.

If there is specific information that you need, please post that here, and I will ask for that information.

================
Additional comments
We have been passed a beta version of Platform MPI with a 64bit build
of mpid.exe.

We have a Windows (10) workstation configured with 2 Processor Groups.
Each of the groups contains 8 logical processing units.; created via
the Windows command:

bcdedit.exe /set groupsize 8

They can be displayed using the coreinfo command:

Logical Processor to Group Map:
Group 0:



Group 1:


All cores on the first socket are in group-0 and all cores on the
second socket are in group-1.

The application is then launched:

"C:\Program Files (x86)\IBM\Platform-MPI\bin\mpirun.exe" -prot -
affopt=vv,coreindex -aff=automatic:bandwidth -np 16 S:
\a.exe

which results in



  • hwloc 1.11.1 has encountered what looks like an error from the
    operating system.
  • L3 (cpuset 0x000000ff) intersects with Package (cpuset 0x0000000f,,
    0x0000000f) without inclusion!
  • Error occurred in topology.c line 981
  • The following FAQ entry in the hwloc documentation may help:
  • What should I do when hwloc reports "operating system" warnings?
  • Otherwise please report this error message to the hwloc user's
    mailing list,
  • along with any relevant topology information from your platform.


Host 0 -- ip 192.168.119.125 -- ranks 0 - 15

host | 0
======|======
0 : SHM

Prot - All Intra-node communication is: SHM

Host 0 -- ip 192.168.119.125 -- [0 1 2 3 4 5 6 7][8 9 10 11 12 13 14 15]

  • R0: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R1: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R2: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R3: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R4: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R5: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R6: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R7: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R8: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R9: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R10: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R11: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R12: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R13: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R14: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0
  • R15: [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0] : 0x0

It seems like:

  • The topology has been correctly identified
  • hwloc generates an error
  • none of the processes are bound to any of the LPU's
    .

@bgoglin
Copy link
Contributor

bgoglin commented Aug 29, 2017

"Package (cpuset 0x0000000f,,0x0000000f)" seems to say that this socket was split across two processor groups. "L3 (cpuset 0x000000ff)" says that this L3 is inside a single processor group. They should have the same cpuset. To be sure, it would be good to set HWLOC_COMPONENTS=-x86 in the environment, in case our x86-backend doesn't like these windows processor groups.

As said above, "coreinfo -cgns" might help. Put the output between triple-backquotes so that Github doesn't break the formatting. Also please post the output of lstopo with HWLOC_COMPONENTS=-x86 set.

Regarding process binding, I need to know which hwloc API was used for binding. And the output of "hwloc-info --support" would help too. Basically, Windows support thread binding but process binding is messy.

You'll need hwloc 1.11.3 for the "hwloc-info --support", but you should upgrade to 1.11.7 anyway, 1.11.1 is very old.

@chenlcl
Copy link
Author

chenlcl commented Aug 30, 2017

Hi Brice,

Currently we only have a two sockets system 6 cores on each socket to do testing. Only when I configure processor group size to 4 I can reproduce the customer's issue. When I use the hwloc 1.11.1 and 1.11.7 the result is same. According to your suggestion I did following test using hwloc 1.11.7. Here is the result.

Before configure processor group the coreinfo output:

C:>coreinfo -cgns

Coreinfo v3.31 - Dump information on system CPU and memory topology
Copyright (C) 2008-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

Logical to Physical Processor Map:
*----------- Physical Processor 0
-*---------- Physical Processor 1
--*--------- Physical Processor 2
---*-------- Physical Processor 3
----*------- Physical Processor 4
-----*------ Physical Processor 5
------*----- Physical Processor 6
-------*---- Physical Processor 7
--------*--- Physical Processor 8
---------*-- Physical Processor 9
----------*- Physical Processor 10
-----------* Physical Processor 11

Logical Processor to Socket Map:
******------ Socket 0
------****** Socket 1

Logical Processor to NUMA Node Map:
******------ NUMA Node 0
------****** NUMA Node 1

Logical Processor to Group Map:
************ Group 0

After configure the processor group size to 4, here is the hwloc-info --support output (with the environment variable HWLOC_COMPONENTS=-x86 set).

C:\hwloc-win64-build-1.11.7\bin>hwloc-info --support
****************************************************************************
* hwloc 1.11.7 has encountered what looks like an error from the operating system.
*
* Package (cpuset 0x00000001,,0x00000008) intersects with L3 (cpuset 0x0000000f) without inclusion!
* Error occurred in topology.c line 1082
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with any relevant topology information from your platform.
****************************************************************************
discovery:pu = 0
cpubind:set_thisproc_cpubind = 0
cpubind:get_thisproc_cpubind = 0
cpubind:set_proc_cpubind = 0
cpubind:get_proc_cpubind = 0
cpubind:set_thisthread_cpubind = 1
cpubind:get_thisthread_cpubind = 1
cpubind:set_thread_cpubind = 1
cpubind:get_thread_cpubind = 1
cpubind:get_thisproc_last_cpu_location = 0
cpubind:get_proc_last_cpu_location = 0
cpubind:get_thisthread_cpubind = 1
membind:set_thisproc_membind = 0
membind:get_thisproc_membind = 0
membind:set_proc_membind = 0
membind:get_proc_membind = 0
membind:set_thisthread_membind = 1
membind:get_thisthread_membind = 1
membind:set_area_membind = 0
membind:get_area_membind = 0
membind:alloc_membind = 1
membind:firsttouch_membind = 0
membind:bind_membind = 1
membind:interleave_membind = 0
membind:nexttouch_membind = 0
membind:migrate_membind = 0
membind:get_area_memlocation = 0

Here is lstopo output and screenshot:

C:\hwloc-win64-build-1.11.7\bin>lstopo
****************************************************************************
* hwloc 1.11.7 has encountered what looks like an error from the operating system.
*
* Package (cpuset 0x00000001,,0x00000008) intersects with L3 (cpuset 0x0000000f) without inclusion!
* Error occurred in topology.c line 1082
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with any relevant topology information from your platform.
****************************************************************************

Keyboard shortcuts:
Zoom-in or out .................... + -
Try to fit scale to window ........ f F
Reset scale to default ............ 1
Scroll vertically ................. Up Down PageUp PageDown
Scroll horizontally ............... Left Right Ctrl+PageUp/Down
Scroll to the top-left corner ..... Home
Scroll to the bottom-right corner . End
Exit .............................. q Q Esc
lstopo

The coreinfo output after set processor group size to 4.

C:\hwloc-win64-build-1.11.7\bin>coreinfo -cgns

Coreinfo v3.31 - Dump information on system CPU and memory topology
Copyright (C) 2008-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

Logical to Physical Processor Map:
Physical Processor 0:
*---
----
----
Physical Processor 1:
-*--
----
----
Physical Processor 2:
--*-
----
----
Physical Processor 3:
---*
----
----
Physical Processor 4:
----
*---
----
Physical Processor 5:
----
-*--
----
Physical Processor 6:
----
----
*---
Physical Processor 7:
----
----
-*--
Physical Processor 8:
----
----
--*-
Physical Processor 9:
----
----
---*
Physical Processor 10:
----
--*-
----
Physical Processor 11:
----
---*
----

Logical Processor to Socket Map:
Socket 0:
**--
----
----
Socket 1:
--*-
----
----
Socket 2:
---*
*---
----
Socket 3:
----
-*--
----
Socket 4:
----
----
**--
Socket 5:
----
----
--*-
Socket 6:
----
--*-
---*
Socket 7:
----
---*
----

Logical Processor to NUMA Node Map:
NUMA Node 0:
****
----
----
NUMA Node 1:
----
----
****
NUMA Node 2:
----
**--
----
NUMA Node 3:
----
--**
----

Logical Processor to Group Map:
Group 0:
****
----
----
Group 1:
----
****
----
Group 2:
----
----
****

@bgoglin
Copy link
Contributor

bgoglin commented Aug 30, 2017

As already explained, groupsize 4 on a machine with 6-core sockets doesn't make sense because 6 isn't divisible by 4. Add to that the fact that windows creates groups in a totally crazy way. See your section "Logical Processor to Socket Map" of coreinfo, it now shows 8 sockets, with socket 0 and 6 spanning across 2 different groups.

Things shouldn't be that crazy on your customer machine since it divides 8-core processors in groups of 8. coreinfo -cgns would confirm that.

By the way, the customer should also explain why he wants to create such useless groups :/

@sthibaul
Copy link
Contributor

More precisely, what does not make sense in using groupsize 4 is that it ends up creating a group which spans over the two sockets without including them all. This is a very odd thing to do, since it artificially brings together 4 cores which are not actually related since they are in different sockets.

@stangraves
Copy link

I have asked for the "core info -cgns" output, and will post that when I receive the information.

The specific group sizes we are testing with are deliberately artificial to take advantage of available machines with small(er) core counts -- but still testing multiple processor group configurations. If that is not generally supported, that is OK -- but it would be nice to know, so that we can write an appropriate restriction for our product.

Is testing with arbitrary processor group assignments a likely cause of the issues we are seeing?
Is there some primmer on how processor groups should be created and managed that we can use as a reference?

@bgoglin
Copy link
Contributor

bgoglin commented Aug 30, 2017

Arbitrary processor group assignments are the cause of some issues above. But I can't say for sure nothing else is broken.

Let's take an example for explaining a good assignment. If you have a machine with 4 sockets, with 6 cores each, and 2 hyperthreads per core.. Obviously-good group sizes are 1 (1 group per HT), 2 (1 group per core), 12 (1 group per socket), and 48 (1 group for the entire machine).

Then you have intermediate sizes:
The number of cores per socket is 6, that's divisible by 2 and 3: you can use groups of 2 or 3 cores (that's a group size of 4 or 6, since there are 2 HT per core).
The number of sockets is 4, that's divisible by 2: you can use groups of 2 sockets, that's a group size of 24 (2 socket * 6 core * 2HT).

All this assumes that groups contain consecutive hyperthread and cores. As Samuel said, you don't want to "create a group which spans over the two sockets without including them all".

@sthibaul
Copy link
Contributor

sthibaul commented Aug 30, 2017

To put a mathematical formulation: you can choose a group size which, for all structural elements of architecture (NUMA node, socket, cache, core, etc.), divides or is a multiple of the number of logical processors of the element of architecture

So the simple case is to just take the number of logical processors of a given element (e.g. a socket). The less simple case is to take a group size which is a divisor of a given element (e.g. a socket), and a multiple of the element just below (e.g. L3 cache).

@stangraves
Copy link

Can we get the customer's "coreinfo -cgns" output for group size 8 environment? 

​The coreinfo is attached in the coreinfo.txt file.​ I also add the output from hwloc-ls.exe, which I hope will provide all the information necessary to determine why Platform MPI cannot bind the ranks when we have multiple processor groups. The system in question has 80 processing units (2 processor groups, 40 processing units per group.)

Or why they want to configure such small size processor groups on Windows HPC environment? 

​Please note we do not want to use small non-standard processor groups sizes. I only used the bcedit method of changing processor group size on my workstation, because I thought it may be easier to reproduce the underlying problem of Platform MPI not being able to bind the ranks when we have more than one processor group.

What are the processor group sizes that are intended to be used with the application? 

​We intend to use the default size for a processor group, namely 64.​ The attached data is gathered on a test system (80 processing units) where we can reproduce the failure of Platform MPI to bind the ranks.

coreinfo.txt
hwloc-1.11.7-win64-ls.txt

@bgoglin
Copy link
Contributor

bgoglin commented Aug 31, 2017

Everything looks good in coreinfo and hwloc-ls.

You forgot "hwloc-info --support" for debugging the binding issue.

But in the end, we'll need to know what platform MPI uses for binding. Does it use hwloc_set_cpubind()? With which flags? Given that process binding is hard on Windows, they may have to fallback to thread binding if they want a good compromise. Thread binding may be enough if done early (before any other thread is started). Also, they'll have to check whether they binding before starting the application process (which requires binding to be inherited) or during MPI_Init in the application.

Also, maybe check binding with another tool. Maybe platform doesn't report binding correctly :)

Have you seen platform perform and report binding correctly on windows in the past?

@chenlcl
Copy link
Author

chenlcl commented Sep 1, 2017

Platform MPI can report binding on Windows correctly. We have asked for the PMPI binding output on customer's product environment. They just gave us the output on group size 8 environment before. And we also let them to try hwloc-bind to find out the if it's the PMPI output issue.

Before we get feed back I want to add more information. On our current environment, I try group size 2 case (Brice think this setting should work). I know the arbitrary group size is not recommend, to do this I just want to make sure if Windows can accept the setting and hwloc can detect topology without error, on this condition whether hwloc can bind correctly.

I run following command and it seems lstopo is binding on random group. Checking with Windows task manager and it gives me the same result.
hwloc-bind node:0.pu:0 -- lstopo --pid 0

lstopo_binding

hwloc_info_coreinfo.txt

@bgoglin
Copy link
Contributor

bgoglin commented Sep 1, 2017

"hwloc-info --support" says that process binding isn't supported here. Also lstopo should show a single PU in green in node:0 according to your hwloc-bind command-line. Instead, it shows 2 PUs in another group because the process wasn't bound (the default windows behavior is to assign a process to a random group and bind it to all cores of that group).

Otherwise the lstopo output looks good.

hwloc cannot bind entire processes when there are multiple groups (issue #78). So missing process binding support is expected here. Also we have issue #151 about hwloc-bind not working well on Windows because it's not clear whether process and/or thread binding is inherited during execvp().

@bgoglin bgoglin added the bug label Feb 21, 2018
@bgoglin
Copy link
Contributor

bgoglin commented Jun 3, 2021

I am closing this old issue because processor group support improved significant in recent years, and we have ways to test it here but we couldn't see any issue recently. If the bug still occurs, please open a new issue,

@bgoglin bgoglin closed this as completed Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants