-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FAQ on using single socket #7311
Comments
Here's an example of a parameterized invocation line
which, of course, doesn't work correctly:
In this case it doesn't seem to be binding processes to cores, it looks like it's assigning 4 cores per proc and then oversubscribing 4 procs to each core.
but the above answer is the only one I'd received. |
Assuming there are more than 2 MPI tasks and you have enough nodes, what about
|
I can't tell if that should solve the problem or not, OpenMPI 4.0.2 fails-out with the error below.
|
Can you try |
In this case here with one node
it's still splitting across sockets, here's the mapping for rank 3, for example: [dgx2-02:202025] MCW rank 3 bound to socket 1[core 20[hwt 0]], socket 1[core 21[hwt 0]], socket 1[core 22[hwt 0]], socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]], socket 1[core 26[hwt 0]], socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]], socket 1[core 30[hwt 0]], socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]], socket 1[core 34[hwt 0]], socket 1[core 35[hwt 0]], socket 1[core 36[hwt 0]], socket 1[core 37[hwt 0]], socket 1[core 38[hwt 0]], socket 1[core 39[hwt 0]]: [./././././././././././././././././././.][B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B/B] |
If it were up to me, I ought to be able to specify --npersocket 4, and then the first 4 procs would go to socket 0 and then spill over to socket 1 and so on. So for multi-node I would use
and it would run 4 procs on socket 0 of each node. |
I'm not entirely sure I understand what you are trying to do, but if you want say four procs on each node, all bound to the first socket on that node, then I would use: $ mpirun --map-by core --hostfile foo ... where the hostfile indicated there are 4 slots on each node. Note that this would bind each proc to only one core, so only the first four cores would be used. If you wanted the procs to simply be bound to the socket, then add If you want each of the procs to use 2 cores on the socket, then you would say: $ mpirun --map-by core:pe=2 --hostfile foo ... Note this will automatically cause each proc to be bound to two cores, not the entire socket, but each proc will have its own two cores. |
Another option for you: if you want to have the job mapped such that all procs land on the first socket of every node until those sockets are completely filled, and then start filling the second socket on every node, then the easiest method is just: $ mpirun --map-by socket:span ... Note that the procs will be evenly spread across the nodes - it won't fill the first socket of the first node and then move to the first socket of the second node. You can control how the ranks are assigned to the procs via the |
I'm still seeing the problem, here's 2 procs on 1 node:
So again it looks like one proc is going onto each socket. |
I believe with MVAPICH2 that either of these forms would solve the problem
The second form uses a fixed number of procs per socket but that would be ok, it's the ability to vary the number of nodes that matters to me. |
@ggouaillardet @rhc54 I probably didn't have the same intention as @cponder , but I stumbled across what seems to be the same issue: This may help - notice how some ranks are "INVALID" when I pass the flag, and the printf() I added in dstore_base.c (inside pmix) tells me that the missing information on rank #2 is responsible for the crash:
|
This has been merged into v4.0.x and will be released with v4.0.4. |
What are you closing here? There are 2 issues now, (1) my wanting a way to run one-socket-per-node without using a hostfile, and (2) the error-message that Alex is reporting. |
Using an existing hostfile means that I can't vary the number of nodes, right? |
Okay, sorry, It was unclear to what extent the PR addressed the issues. |
We solved the problem from @alex--m. I confess I'm still having trouble really understanding the other problem here. IIUC, what @cponder wants is to have 4 procs running on socket0 of each node. Yet I am not gathering why the following reportedly doesn't work: $ mpirun -n $((NODES*4)) -bind-to socket --cpu-list 0,6,12,18 --report-bindings Yes, it will output that the procs "are not bound", but that is because the procs are being confined to the specific cores listed here, and those cores are all on the same socket. Hence, the procs are "bound to all available processors", which is what the full message says. Are you sure you aren't getting what you want? Have you printed out the actual bitmask to see where the procs are? In reality, once you specified the cpu-list (and those cpus cover the first socket on each node), you don't gain anything by the I'm working to improve the binding message to make it clearer what has happened - perhaps that is the only true issue here. |
Can you give me a more specific command I can try running? |
Errr...well, why don't you use the above command and run a program like this: #define _GNU_SOURCE
#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char **argv)
{
cpu_set_t mask;
long nproc, i;
if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
perror("sched_getaffinity");
assert(false);
}
nproc = sysconf(_SC_NPROCESSORS_ONLN);
printf("sched_getaffinity = ");
for (i = 0; i < nproc; i++) {
printf("%d ", CPU_ISSET(i, &mask));
}
printf("\n");
} |
I get these messages
and 16 copies of this line
It looks to me like it is showing cores 0,6,12,18 on both sockets. |
As I understand this, @cponder is looking for a way of fully exploiting half-populated node (i.e. single socket out of 2 sockets) without using a
But this does not work as I was expecting
|
I have a similar problem - I'm trying to bind all processes to socket #1 but I can't work out how. |
Am I understand correctly that If so, then printing what processors are available may be useful. Because message like "not bound (or bound to all available processors)" can be little bit misleading. WDYT? |
I doubt that anything will be done for prior releases, but I'll take a crack in OMPI v5 at providing (a) a clearer statement as to the "not bound" vs "bound to all available", and (b) adding a "show-cpus" option that will tell you what cpus are available by socket. |
I added this display output for you (see openpmix/prrte#1634). It doesn't output much just yet, but I may have someone willing to extend/beautify the output in the near future: $ prterun --prtemca hwloc_use_topo_file /Users/rhc/pmix/topologies/summit.h17n08.lstopo-2.2.0.xml --prtemca ras_simulator_num_nodes 3 --map-by package:pe=5:corecpu -n 2 --display cpus=nodeA1,map hostname
====================== AVAILABLE PROCESSORS [node: nodeA1] ======================
PKG[0]: 0-20
PKG[1]: 21-41
======================================================================
======================== JOB MAP ========================
Data for JOB prterun-Ralphs-iMac-2-61474@1 offset 0 Total slots allocated 126
Mapping policy: BYPACKAGE:NOOVERSUBSCRIBE Ranking policy: FILL Binding policy: CORE:IF-SUPPORTED
Cpu set: N/A PPR: N/A Cpus-per-rank: 5 Cpu Type: CORE
Data for node: nodeA0 Num slots: 42 Max slots: 42 Num procs: 2
Process jobid: prterun-Ralphs-iMac-2-61474@1 App: 0 Process rank: 0 Bound: package[0][core:0-4]
Process jobid: prterun-Ralphs-iMac-2-61474@1 App: 0 Process rank: 1 Bound: package[1][core:21-25]
$ |
Over-and-over again, i run into this issue when doing performance testing.
I would appreciate it if you would add the answer to your FAQ:
I don't want to use a HOSTFILE, I want to do it from the command-line so i don't have to hard-code the process topology. I want to use parameterized arguments so the "mpirun" line will scale.
The text was updated successfully, but these errors were encountered: