-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A system call failed during shared memory initialization ... #7393
Comments
@devreal @artpol84 (for visibility). I tried the test and I'm able to reproduce with osc rdma, but osc ucx works fine.
|
I am observing a similar failure. I am using OpenMPI 4.0.5 on macOS Catalina (version 10.15.7 (19H2)); OpenMPI was installed through Homebrew. My machine has 8 real cores, and 32 GB of RAM; the program never uses more than about 6 GB. The output I get is attached below. The first line of the output is the correct output printed at the end of the program.
|
This seems to be two different unlink problems.
Others have seen the 2nd one (in vader) since v4.0.2, but it's been very difficult to reproduce and isolate. |
Any progress on this? It happens for me very repeatably, so if there is any guidance on what to instrument to shed some light on this, I'll happily do the work. I see the problem on both mac os catalina and big sur, with a macports installation across multiple versions since 4.0.1, compiled with multiple versions of both gcc and clang |
Try Truncation can occur on OSX with the default |
Do you think it's worth exploring the cause of this failure and a more permanent solution? I guess it's probably acceptable to tell users to set their TMPDIR, but it might be good to know why. |
We do have https://www.open-mpi.org/faq/?category=osx#startup-errors-with-open-mpi-2.0.x, but it looks like the verbiage on it is a bit out of date. The underlying macOS issue is the same, however. |
More specifically, it looks like we had a specific check for this back in the 2.0.x/2.1.x timeframe (i.e., emit a very specific error message that helped users workaround that issue). But apparently that very specific message either has gotten lost or isn't functioning properly in Open MPI v4.0.x/v4.1.x. A little history here: Open MPI's underlying run-time system has been slowly been evolving into its own project. For example, the PMIx project directly evolved from a good chunk of what used to be part of Open MPI itself (i.e., Open MPI's run-time system). Ever since PMIx split off into its own project, Open MPI has distributed an embedded copy of the PMIx source code. In this way, 99% of Open MPI users aren't even aware of the code split. In the upcoming Open MPI v5.0.x, basically the rest of Open MPI's run-time system is splitting off into a project called PRTE. As such, Open MPI v5.0.x will carry embedded copies of both PMIx and PRTE. All this is to say that the error (i.e., either the lack of or the malfunctioning of the specific macOS TMPDIR error message) is almost certainly in PMIx: https://github.com/openpmix/openpmix. The error should be fixed over there and then back-ported to the embedded copies in Open MPI v4.0.x, 4.1.x, and the upcoming 5.0.x. |
PMIx used to do this check because we were using Unix domain sockets back in those days: // If the above set temporary directory name plus the pmix-PID string
// plus the '/' separator are too long, just fail, so the caller
// may provide the user with a proper help... *Cough*, *Cough* OSX...
if ((strlen(tdir) + strlen(pmix_pid) + 1) > sizeof(myaddress.sun_path)-1) {
free(pmix_pid);
/* we don't have show-help in this version, so pretty-print something
* the hard way */
fprintf(stderr, "PMIx has detected a temporary directory name that results\n");
fprintf(stderr, "in a path that is too long for the Unix domain socket:\n\n");
fprintf(stderr, " Temp dir: %s\n\n", tdir);
fprintf(stderr, "Try setting your TMPDIR environmental variable to point to\n");
fprintf(stderr, "something shorter in length\n");
return PMIX_ERR_SILENT; // return a silent error so our host knows we printed a message
} The check was removed once we went away from that method (switching back to TCP). We should probably discuss where a more permanent location should be - it's a shared memory problem (which is in OMPI, not PMIx), so I'm leery of putting something in PMIx that assumes how long a shmem backing filename might be. |
Ah, ok, that sounds totally reasonable (that the check should be in Open MPI, not PMIx). That makes things simpler, too. @marcpaterno @aivazis Is the case where the problem occurs the same / similar to the originally-cited problem on this issue? |
My case pops up when multiple MPI jobs are launched at the same time by the same user: I have a test suite that exercises my python bindings that runs in parallel. What I think is happening is that the openmpi clean up code tries to remove the temporary files it created as children of a temporary directory. It seems that the name of this directory is seeded with my uid, instead of the process id, and the first job that terminates destroys the temporary directory the other instances rely on. Fixing my problem could be as simple as tweaking the algorithm that names the temporary directory. I'll try to verify this and post a screenshot from a session. |
Took another look. I was partly wrong: the temporary path contains both my uid and the pid of the running process. The error I get mentions the directory that couldn't be unlinked:
And partly correct: the clean up code appears to remove a directory a few levels up. As I watch the filesystem, the directory at
disappears, and the other MPI instances start crashing |
So there are two very different problems being discussed here - which is fine, just wanted to be clear. The first problem has to do with the length of the The second problem is the one mentioned by @aivazis. This is caused by a race condition - We use only the uid in the top-level directory so that sys admins have an easier time cleaning up should someone have a bunch of unclean terminations. They just look for the directory with that uid in it and There are two solutions to the problem. First, we could provide an option telling mpirun to add the pid to the top-level directory name. This would allow those with the use-case described by @aivazis to avoid the problem while preserving the sys admin's request for simplicity. Other solution is to use PRRTE, which would also allow your test suite to complete faster (far less time starting/stopping each job). What you would do is have your test setup start the Frankly, it is one of the primary use-cases for PRRTE. You can learn more about it here and find the code here |
@rhc54 This issue is opened against Open MPI v4.0.x. Does PRTE work with the Open MPI v4.0.x and v4.1.x series? |
Sure - so long as PRRTE is built against PMIx v4.1 or above it will support any OMPI version starting with the 2.x series. |
@aivazis I have created this option - it will become available in OMPI v5. You just need to add |
Any fixes for this problem? I've been seeing these error messages on my Mac for a long time:
|
You mean other than the one already outlined above - i.e., create a |
If the codes go like this, the problem appears. But if I use this instead of
I am completely new to OpenMPI, I encountered this problem on my M1 Mac with the newest Monterey. |
If I do that it may screw up other program. Is there any possible fix from openmpi side? |
In one case, |
open-mpi/ompi#7393 Thanks-to: Jake Tronge <jtronge3@gmail.com>
This fixed it for me (Mac with M1 chip)
|
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.0.1 and v4.0.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Installed from the source tarball (both with Intel Parallel Studio 2020.0.088 as well as with GCC-7.4.0).
Please describe the system on which you are running
Details of the problem
When I split the comm_world communicator into two groups (comm_shmem) and try to allocate shmem segments on the latter by means of MPI_win_allocate I get the following error message:
I used the following program:
and ran it with 8 ranks:
Switching to "posix" (mpirun -mca shmem posix ...) gets rid of this error but has problems of its own for which I'll submit a separate issue.
The text was updated successfully, but these errors were encountered: