Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open MPI error: A system-required executable either could not be found or was not executable by this user in file ess_singleton_module.c at line 458 #573

Closed
andersahp opened this issue Sep 7, 2018 · 5 comments

Comments

@andersahp
Copy link

andersahp commented Sep 7, 2018

Just installed gym and baselines, and when testing the baselines install running DQN/Pong example i get an Open MPI error.

I am pretty much clueless as to what i can try, so any suggestions are appreciated. I have tried force reinstalling mpi4py (version 3.0.0), but no luck.

I run python 3.6.5 in a virtualenv, on linux mint 19 cinnamon 3.8.8.

(env36) anders@anders-ThinkPad-Edge-E530:~/Desktop/env36/bin$ python -m baselines.run -- alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6
[anders-ThinkPad-Edge-E530:03743] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ess_singleton_module.c at line 458
[anders-ThinkPad-Edge-E530:03743] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ess_singleton_module.c at line 166
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure; 
here's some additional information (which may only be relevant to an
Open MPI developer):

orte_ess_init failed
--> Returned value A system-required executable either could not be found or was not executable by 
this user (-126) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment 
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

ompi_mpi_init: ompi_rte_init failed
--> Returned "A system-required executable either could not be found or was not executable by this 
user" (-126) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[anders-ThinkPad-Edge-E530:3743] Local abort before MPI_INIT completed completed successfully, 
but am not able to aggregate error messages, and not able to guarantee that all other processes were 
killed!
@pzhokhov
Copy link
Collaborator

Hi @andersahp ! Frankly, this does not look an error I have seen before, and it seems to be related to MPI rather than to python / baselines.
First thing first - do you need MPI at all? If not, you can uninstall mpi4py (pip uninstall mpi4py); and all-MPI related stuff will be skipped; this should likely bypass your problem.
If you do want to run baselines algos with mpi,
could you try a sanity check for me please and run

mpirun -np 2 echo "foo"

If that succeeds (prints "foo" twice), that means that the error is somewhere in mpi4py; otherwise - in your particular installation of MPI. In that case I'd recommend re-installing OpenMPI, and if that does not help, try MPICH.
Also, I noticed there is a space in --alg in your command line. I doubt that's the problem, but still.

@andersahp
Copy link
Author

Thanks pzhovkov. I actually thought mpi was used for parallel computing. But your question made me Google again, and I see that it's not. In what situation would I need mpi when using baselines?

@pzhokhov
Copy link
Collaborator

MPI is used for parallel computing (stands for message passing interface - the protocol in which parallel processes explicitly send each other messages with data), but that's not the only way to do parallel computing. When all processes are running on the same machine, processes can also communicate via shared memory or pipes (actually, pipes can work over network too); when running on multiple machines, other communication interfaces can be used. Now, to your question - in what situation would you need mpi when using baselines - some of the baselines algorithms are compatible with mpi (at the moment ppo2 and trpo_mpi). These algorithms run copies of neural network in several processes and share parameters using MPI; but MPI does not have to be used - if you don't need to run them on a cluster, no MPI acts exactly like MPI with one process.

@andersahp
Copy link
Author

Thanks alot pzhokov for the clarification. :)

@dp90
Copy link

dp90 commented Feb 11, 2020

Had the same error. Solved it with:
sudo apt-get install openmpi-bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants