Skip to content

docs: fix outdated descriptions of -output-filename #11032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

e-kwsm
Copy link
Contributor

@e-kwsm e-kwsm commented Nov 4, 2022

see #7095

Signed-off-by: Eisuke Kawashima e-kwsm@users.noreply.github.com

@ompiteam-bot
Copy link

Can one of the admins verify this patch?

@e-kwsm
Copy link
Contributor Author

e-kwsm commented Nov 4, 2022

The behaviour of Open MPI 4.1.4 is the same as reported in #7095.

$ mpirun --version
mpirun (Open MPI) 4.1.4

Report bugs to http://www.open-mpi.org/community/help/
$ mpiexec -n 1 --output-filename out.txt echo "Hi"
Hi
$ find out.txt
out.txt
out.txt/1
out.txt/1/rank.0
out.txt/1/rank.0/stdout
out.txt/1/rank.0/stderr

I haven’t tried the latest OpenMPI; feel free to close the PR if this is already fixed (anyway, the man is wrong in 4.1.4).

@awlauria awlauria requested a review from jsquyres November 4, 2022 13:10
@awlauria
Copy link
Contributor

awlauria commented Nov 4, 2022

ok to test

@awlauria awlauria added the NEWS label Nov 4, 2022
Comment on lines 365 to 377
into `{filename}/{job}/rank.{rank}/std[out,err,diag]`, where `{rank}` is the
processes' rank in MPI_COMM_WORLD, left-filled with zero's for correct
ordering in listings. Any directories in the filename will automatically be
created. A relative path value will be converted to an absolute path based on
the cwd where mpirun is executed. Note that this will not work on
environments where the file system on compute nodes differs from that where
:ref:`mpirun(1) <man1-mpirun>` is executed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
into `{filename}/{job}/rank.{rank}/std[out,err,diag]`, where `{rank}` is the
processes' rank in MPI_COMM_WORLD, left-filled with zero's for correct
ordering in listings. Any directories in the filename will automatically be
created. A relative path value will be converted to an absolute path based on
the cwd where mpirun is executed. Note that this will not work on
environments where the file system on compute nodes differs from that where
:ref:`mpirun(1) <man1-mpirun>` is executed.
into ``{filename}/{job}/rank.{rank}/std[out,err,diag]``, where ``{rank}`` is the
processes' rank in ``MPI_COMM_WORLD``, left-filled with zero's for correct
ordering in file listings. Any intermediate directories in the resulting output files will automatically be
created. If ``filename`` is a relative path, it will be converted to an absolute path based on
the diretory where :ref:`mpirun(1) <man1-mpirun>` is executed. Note that this will not work in
environments where the file system on compute nodes differs from that where
:ref:`mpirun(1) <man1-mpirun>` is executed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment "will not work" accurate? In an environment where <mpirun cwd> does not exist on the remote nodes, doesn't the filename/directory hierarchy get created relative to $HOME on the remote nodes?

Copy link
Contributor Author

@e-kwsm e-kwsm Nov 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an environment where <mpirun cwd> does not exist on the remote nodes, doesn't the filename/directory hierarchy get created relative to $HOME on the remote nodes?

I tried on a cluster, and the hierarchy is created relative to the current working directory:

$ ompi_info --version
Open MPI v4.1.1

http://www.open-mpi.org/community/help/
$ mpirun -map-by ppr:1:node rm -rf /tmp/foo  # make sure the directory does not exist
$ mkdir /tmp/foo  # /tmp is not shared among the nodes
$ cd /tmp/foo
$ mpirun -map-by ppr:1:node hostname
node1
node2
$ mpirun -map-by ppr:1:node pwd
/tmp/foo
/home/me

At this point, the directory is not created; it is created when mpirun with -output-filename is finished:

$ mpirun -map-by ppr:1:node -output-filename bar pwd
/tmp/foo
/home/me
$ mpirun -map-by ppr:1:node pwd
/tmp/foo
/tmp/foo
$ mpirun -map-by ppr:1:node find $PWD
/tmp/foo
/tmp/foo/bar
/tmp/foo/bar/1
/tmp/foo/bar/1/rank.0
/tmp/foo/bar/1/rank.0/stdout
/tmp/foo/bar/1/rank.0/stderr
/tmp/foo
/tmp/foo/bar
/tmp/foo/bar/1
/tmp/foo/bar/1/rank.1
/tmp/foo/bar/1/rank.1/stdout
/tmp/foo/bar/1/rank.1/stderr
$ mpirun -map-by ppr:1:node find $PWD -type f -name stdout -exec cat {} +
/tmp/foo
/home/me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure if mpirun always behaves like this, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is created when mpirun with -output-filename is finished

Wrong: the directory is created when mpirun is invoked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhc54 Gotcha.

@e-kwsm Keep in mind that the docs on main are effectively the docs for v5.0 -- these are not the docs for v4.1.x (the ReadTheDocs / Sphinx docs are new for main / v5.0.x and were not back-ported to v4.1.x or earlier). Hence, for main:docs/, we want to document what is happening in the main / v5.0.x mpirun.

My question about the <mpirun cwd> comment was specifically asking about the case where the CWD of mpirun does not exist on a node. In that case, I have a dim recollection that the output tree for that node will be created in $HOME (since the CWD of mpirun does not exist n that node). Is that no longer the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That remains the case. Assuming you don't give us an absolute path (which you can do - the directory then must exist everywhere), then the path is relative to the local PRRTE daemon. mpirun will use its CWD, and the default CWD of a remote daemon (if the CWD of mpirun doesn't exist there) will be $HOME of the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhc54 Thanks!

@e-kwsm Can you update the docs to reflect what Ralph just stated, above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built 8113e7c by myself, and found that the pattern is changed:

$ ompi_info --version
Open MPI v5.1.0a1

https://www.open-mpi.org/community/help/
$ mpirun --output-filename foo -n 2 sh -c 'echo $$' : -n 2 sh -c 'hostname >&2'
334108
334109
localhost
localhost
$ ls foo.*
foo.prterun-localhost-334102@1.0.out
foo.prterun-localhost-334102@1.1.out
foo.prterun-localhost-334102@1.2.err
foo.prterun-localhost-334102@1.3.err

A filename seems to be {arg}.prterun-{hostname}-{PID of mpirun}@1.{rank}.(out|err).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct - we tag it to avoid collisions with other mpirun instances that were given the same output-filename option and for shared filesystems. Just one clarification for cases where the application calls MPI_Comm_spawn - the "@1" represents the local jobid of the application. So if the app called spawn, there would be another set of files with an "@2" for the new spawned job. Continued for every spawn.

@jsquyres jsquyres requested a review from jjhursey November 4, 2022 16:06
@github-actions
Copy link

github-actions bot commented Nov 6, 2022

Hello! The Git Commit Checker CI bot found a few problems with this PR:

acf0ab4: Update docs/man-openmpi/man1/mpirun.1.rst

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

see open-mpi#7095

Signed-off-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
Comment on lines +375 to +377
<man1-mpirun>` is executed. Note that this will not work in environments
where the file system on compute nodes differs from that where
:ref:`mpirun(1) <man1-mpirun>` is executed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that, per the discussion on the PR, that the resulting files will be created, but they may end up elsewhere...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants