-
Notifications
You must be signed in to change notification settings - Fork 902
docs: fix outdated descriptions of -output-filename
#11032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Can one of the admins verify this patch? |
The behaviour of Open MPI 4.1.4 is the same as reported in #7095. $ mpirun --version
mpirun (Open MPI) 4.1.4
Report bugs to http://www.open-mpi.org/community/help/
$ mpiexec -n 1 --output-filename out.txt echo "Hi"
Hi
$ find out.txt
out.txt
out.txt/1
out.txt/1/rank.0
out.txt/1/rank.0/stdout
out.txt/1/rank.0/stderr I haven’t tried the latest OpenMPI; feel free to close the PR if this is already fixed (anyway, the man is wrong in 4.1.4). |
ok to test |
docs/man-openmpi/man1/mpirun.1.rst
Outdated
into `{filename}/{job}/rank.{rank}/std[out,err,diag]`, where `{rank}` is the | ||
processes' rank in MPI_COMM_WORLD, left-filled with zero's for correct | ||
ordering in listings. Any directories in the filename will automatically be | ||
created. A relative path value will be converted to an absolute path based on | ||
the cwd where mpirun is executed. Note that this will not work on | ||
environments where the file system on compute nodes differs from that where | ||
:ref:`mpirun(1) <man1-mpirun>` is executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
into `{filename}/{job}/rank.{rank}/std[out,err,diag]`, where `{rank}` is the | |
processes' rank in MPI_COMM_WORLD, left-filled with zero's for correct | |
ordering in listings. Any directories in the filename will automatically be | |
created. A relative path value will be converted to an absolute path based on | |
the cwd where mpirun is executed. Note that this will not work on | |
environments where the file system on compute nodes differs from that where | |
:ref:`mpirun(1) <man1-mpirun>` is executed. | |
into ``{filename}/{job}/rank.{rank}/std[out,err,diag]``, where ``{rank}`` is the | |
processes' rank in ``MPI_COMM_WORLD``, left-filled with zero's for correct | |
ordering in file listings. Any intermediate directories in the resulting output files will automatically be | |
created. If ``filename`` is a relative path, it will be converted to an absolute path based on | |
the diretory where :ref:`mpirun(1) <man1-mpirun>` is executed. Note that this will not work in | |
environments where the file system on compute nodes differs from that where | |
:ref:`mpirun(1) <man1-mpirun>` is executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment "will not work" accurate? In an environment where <mpirun cwd>
does not exist on the remote nodes, doesn't the filename/directory hierarchy get created relative to $HOME
on the remote nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an environment where
<mpirun cwd>
does not exist on the remote nodes, doesn't the filename/directory hierarchy get created relative to$HOME
on the remote nodes?
I tried on a cluster, and the hierarchy is created relative to the current working directory:
$ ompi_info --version
Open MPI v4.1.1
http://www.open-mpi.org/community/help/
$ mpirun -map-by ppr:1:node rm -rf /tmp/foo # make sure the directory does not exist
$ mkdir /tmp/foo # /tmp is not shared among the nodes
$ cd /tmp/foo
$ mpirun -map-by ppr:1:node hostname
node1
node2
$ mpirun -map-by ppr:1:node pwd
/tmp/foo
/home/me
At this point, the directory is not created; it is created when :mpirun
with -output-filename
is finished
$ mpirun -map-by ppr:1:node -output-filename bar pwd
/tmp/foo
/home/me
$ mpirun -map-by ppr:1:node pwd
/tmp/foo
/tmp/foo
$ mpirun -map-by ppr:1:node find $PWD
/tmp/foo
/tmp/foo/bar
/tmp/foo/bar/1
/tmp/foo/bar/1/rank.0
/tmp/foo/bar/1/rank.0/stdout
/tmp/foo/bar/1/rank.0/stderr
/tmp/foo
/tmp/foo/bar
/tmp/foo/bar/1
/tmp/foo/bar/1/rank.1
/tmp/foo/bar/1/rank.1/stdout
/tmp/foo/bar/1/rank.1/stderr
$ mpirun -map-by ppr:1:node find $PWD -type f -name stdout -exec cat {} +
/tmp/foo
/home/me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure if mpirun
always behaves like this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is created when
mpirun
with-output-filename
is finished
Wrong: the directory is created when mpirun
is invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhc54 Gotcha.
@e-kwsm Keep in mind that the docs on main
are effectively the docs for v5.0 -- these are not the docs for v4.1.x (the ReadTheDocs / Sphinx docs are new for main / v5.0.x and were not back-ported to v4.1.x or earlier). Hence, for main:docs/
, we want to document what is happening in the main / v5.0.x mpirun
.
My question about the <mpirun cwd>
comment was specifically asking about the case where the CWD of mpirun
does not exist on a node. In that case, I have a dim recollection that the output tree for that node will be created in $HOME
(since the CWD of mpirun
does not exist n that node). Is that no longer the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That remains the case. Assuming you don't give us an absolute path (which you can do - the directory then must exist everywhere), then the path is relative to the local PRRTE daemon. mpirun
will use its CWD, and the default CWD of a remote daemon (if the CWD of mpirun
doesn't exist there) will be $HOME
of the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I built 8113e7c by myself, and found that the pattern is changed:
$ ompi_info --version
Open MPI v5.1.0a1
https://www.open-mpi.org/community/help/
$ mpirun --output-filename foo -n 2 sh -c 'echo $$' : -n 2 sh -c 'hostname >&2'
334108
334109
localhost
localhost
$ ls foo.*
foo.prterun-localhost-334102@1.0.out
foo.prterun-localhost-334102@1.1.out
foo.prterun-localhost-334102@1.2.err
foo.prterun-localhost-334102@1.3.err
A filename seems to be {arg}.prterun-{hostname}-{PID of mpirun}@1.{rank}.(out|err)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct - we tag it to avoid collisions with other mpirun instances that were given the same output-filename option and for shared filesystems. Just one clarification for cases where the application calls MPI_Comm_spawn - the "@1" represents the local jobid of the application. So if the app called spawn, there would be another set of files with an "@2" for the new spawned job. Continued for every spawn.
Hello! The Git Commit Checker CI bot found a few problems with this PR: acf0ab4: Update docs/man-openmpi/man1/mpirun.1.rst
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
acf0ab4
to
efd5fba
Compare
see open-mpi#7095 Signed-off-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
efd5fba
to
262830c
Compare
<man1-mpirun>` is executed. Note that this will not work in environments | ||
where the file system on compute nodes differs from that where | ||
:ref:`mpirun(1) <man1-mpirun>` is executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that, per the discussion on the PR, that the resulting files will be created, but they may end up elsewhere...?
see #7095
Signed-off-by: Eisuke Kawashima e-kwsm@users.noreply.github.com