Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix batch SGE example. #836

Open
wants to merge 1 commit into
base: development
Choose a base branch
from

Conversation

adamjhn
Copy link
Collaborator

@adamjhn adamjhn commented Sep 25, 2024

Remove the invalid -hosts argument in mpiexec. I'm not sure if you need $NSLOTS or the -host arguments, but this worked for me on the Downstate HPC.
A fix for #835

@jchen6727
Copy link
Collaborator

Hello,

We originally implemented the mpi command with these extra arguments due to core oversubscription and some multithreading issues per the recommendations of the HPC admin. Would you be able to run the scheduled command through time to ensure that the command on your PR executes as expected?

Thanks,

James

@adamjhn
Copy link
Collaborator Author

adamjhn commented Oct 2, 2024

It runs as expected on the Downstate HPC, I'm using OpenMPI 5.0.3, maybe there is a different setup?
If I use host instead of hosts, i.e.mpiexec -host $(hostname) -n $NSLOTS ... I get a different error:

------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 3
slots that were requested by the application:

  nrniv

Either request fewer procs for your application, or make more slots
available for use.

A "slot" is the PRRTE term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which PRRTE processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, PRRTE defaults to the number of processor cores

In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------

Also NSLOTS is one less than the number of cores requested.
It runs if I add the :Nto host, i.e. mpiexec -host $(hostname):$NSLOTS -n $NSLOTS ....
It's odd there would be a problem with oversubscription, as mpiexec default to --nooversubscribe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants