Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defect: cafrun broken uses wrong mpiexec executable - path not updated upon compilation #785

Open
DetlevCM opened this issue Sep 5, 2024 · 2 comments

Comments

@DetlevCM
Copy link

DetlevCM commented Sep 5, 2024

The title of the issue should start with Defect: followed by a
succinct title.

Please make sure to put any logs, terminal output, or code in
fenced code blocks. Please also read the contributing guidelines
before submitting a new issue.

Please note we will close your issue without comment if you delete, do not read or do not fill out the issue checklist below and provide ALL the requested information.

  • [ x] I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: 2.10.2
  • Fortran Compiler: gcc version 7.5.0
  • C compiler used for building lib: gcc version 7.5.0
  • Installation method: ./install.sh
  • All flags & options passed to the installer none
  • Output of uname -a: 5.14.21-150500.55.52-default tests dis_transpose: test passed  #1 SMP PREEMPT_DYNAMIC Tue Mar 5 16:53:41 UTC 2024 (a62851f) x86_64 x86_64 x86_64 GNU/Linux
  • MPI library being used: it is supposed to be whatever OpenCoarrays builds itself
  • Machine architecture and number of physical cores: x86_64
  • Version of CMake: 3.20.4 (system version)

To help us debug your issue please explain:

What you were trying to do (and why)

Standard installation of OpenCoarrays, compiling all dependencies.

What happened (include command output, screenshots, logs, etc.)

The code claims all compiled successfully.
However "cafrun" uses the wrong mpiexec or does not find it.

What you expected to happen

The code to be written in a competent way, so that it finds the executable that was compiled as part of the build process when running cafrun.

Step-by-step reproduction instructions to reproduce the error/bug

Have a computer that has mpich/openmpi installed, but not in the path (e.g. due to mpi-selector).
Build OpenCoarrays including all dependencies and thins seem to work.
Load the environment and find that cafrun is broken because it searches for an executable in a directory it has no business accessing...

2024-09-05 08:32:17 UTC [     info]  
#                                                                                
# Execute this script via the following command:                                 
# source /lustre/cms/username/tmp/OpenCoarrays-2.10.2/prerequisites/installations//opencoarrays/2.10.2/setup.sh                                              
                                                                                
# Prepend the CMake path to the PATH environment variable:
# Prepend the compiler path to the PATH environment variable:
# Prepend the MPI path to the PATH environment variable:
# Prepend the OpenCoarrays path to the PATH environment variable:
*** To set up your environment for using caf and cafrun, please ***
*** source the installed setup.sh file in bash or Z shell or    ***
*** source setup.csh in a C-shell or add one of the following   ***
*** statements to your login file:                              ***

source /lustre/cms/username/tmp/OpenCoarrays-2.10.2/prerequisites/installations//opencoarrays/2.10.2/setup.sh
source /lustre/cms/username/tmp/OpenCoarrays-2.10.2/prerequisites/installations//opencoarrays/2.10.2/setup.csh

*** Installation complete.                                        ***
2024-09-05 08:32:19 UTC [     info] Cleaning up. Done
username@slurm-login:/lustre/cms/username/tmp/OpenCoarrays-2.10.2> which mpiexec
which: no mpiexec in (/home/cms/username/bin:/usr/local/bin:/usr/bin:/bin)
username@slurm-login:/lustre/cms/username/tmp/OpenCoarrays-2.10.2> source /lustre/cms/username/tmp/OpenCoarrays-2.10.2/prerequisites/installations//opencoarrays/2.10.2/setup.sh
username@slurm-login:/lustre/cms/username/tmp/OpenCoarrays-2.10.2> which mpiexec
/lustre/cms/username/tmp/OpenCoarrays-2.10.2/prerequisites/installations/mpich/3.2/bin/mpiexec 

username@slurm-login:/lustre/cms/username/tmp/OpenCoarrays-2.10.2> cafrun --show
/usr/lib64/mpi/gcc/mpich/bin/mpiexec -n <number_of_images> /path/to/coarray_Fortran_program [arg4 [arg5 [...]]]
username@slurm-login:/lustre/cms/username/tmp/OpenCoarrays-2.10.2> which mpiexec
/lustre/cms/username/tmp/OpenCoarrays-2.10.2/prerequisites/installations/mpich/3.2/bin/mpiexec

Inspection of the cafrun script shows that for some reason the path to mpiexec is hardcoded - whyever, after possibly cmake went and found it wherever on the OS rather than rely on the output of say "which mpiexec" ... (Possibly searching through subdirectories.)

@nakib
Copy link

nakib commented Oct 25, 2024

One workaround could be to comment out find_package( MPI ) in CMakeLists.txt. But I am not sure if this is a generally applicable solution.

@vehre
Copy link
Collaborator

vehre commented Oct 25, 2024

The paths in bin/caf and bin/cafrun are set at configuration time, i.e. when cmake is run to setup the project. Changing the mpi-implementation needs deletion of the build-directory and reconfiguration of those scripts. Therefore this is not exactly an issue in the build process of OpenCoarrays, but of configuration time. When using some module or mpi-selector mechanism those need to make sure, that only the correct executable of mpi are in the path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants