Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Import MPI only when needed #430

Closed
AdamGleave opened this issue Aug 1, 2019 · 6 comments · Fixed by #433
Closed

[feature request] Import MPI only when needed #430

AdamGleave opened this issue Aug 1, 2019 · 6 comments · Fixed by #433
Labels
enhancement New feature or request

Comments

@AdamGleave
Copy link
Collaborator

AdamGleave commented Aug 1, 2019

Describe the bug
Only four (DDPG, GAIL, TRPO and PPO1) of the twelve algorithms implemented in Stable Baselines use MPI. However, importing stable_baselines -- even if you do not use any of the algorithms -- indirectly executes from mpi4py import MPI.

This has a number of issues:

  • mpi4py has a binary dependency on OpenMPI, which often needs to be installed especially.
  • In some installations of OpenMPI, any code that uses MPI must be executed under mpirun, even if you only use a single process.
  • OpenMPI is unreliable, and interacts particularly poorly with multi-threading (e.g. TensorFlow) and multi-processing (e.g. SubprocVecEnv).

Code example
I've run into many issues with OpenMPI over time. Never tracked down the root cause satisfactorily. As a recent example, https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/scripts/train.py deadlocked inside OpenMPI on Ubuntu 18.04 with OpenMPI 2. It worked fine on a recent Mac OS X install. Previously I've also had issues with OpenMPI 4, and found OpenMPI 3 seems to be the most reliable version.

@shwang @qxcv may have more details on this failure mode.

Suggested Resolution
The two main options I see are:

  • Change stable_baselines/__init__.py to not automatically import all the algorithms. This seems the cleanest, but would introduce a breaking API change: users would need to do e.g. from stable_baselines.ppo1 import PPO1 rather than from stable_baselines import PPO1. It would speed up importing stable_baselines though, which right now takes a while.
  • Change all files that use MPI to import it lazily. This should be fairly easily: most the time it's only used in one function in each file. It would make things a bit more fragile: if mpi4py wasn't installed, you would only get an error when you actually run the algorithm.

I'm happy to open a PR on this if there's agreement on if/how to resolve this.

@AdamGleave AdamGleave added the enhancement New feature or request label Aug 1, 2019
@araffin
Copy link
Collaborator

araffin commented Aug 1, 2019

Hello,
I also encountered issues in the past with mpi.

In fact, they changed that also in the original baselines: openai#689

Change stable_baselines/init.py to not automatically import all the algorithms.

I would go for something intermediary: try to import mpi in the __init__.py and don't import PPO1, TRPO and DDPG if it is not present. This would prevent breaking the API while allowing not to rely on MPI.
However, I'm afraid we still need to import mpi4py lazily anyway...

EDIT: one important thing would be to update the documentation to explain the change/how to have access to all algorithms

@AdamGleave
Copy link
Collaborator Author

I would go for something intermediary: try to import mpi in the __init__.py and don't import PPO1, TRPO and DDPG if it is not present. This would prevent breaking the API while allowing not to rely on MPI.
However, I'm afraid we still need to import mpi4py lazily anyway...

I like this suggestion! I think the tricky parts might be some of the code in common/ that depends on MPI. But I think most of this can be solved by making sure they're only imported by MPI-dependent algorithms. There'll be a couple of places, like stable_baselines/logger.py (uses it to get the rank) where there'll need to be a separate check for if MPI is installed.

@shwang
Copy link

shwang commented Aug 2, 2019

Not a ton more relevant details except that this is our hilarious workaround for the time being.

image

Looking into this issue right now...

edit: Ok, I'll go ahead and try araffin's suggestion:

try to import mpi in the init.py and don't import PPO1, TRPO and DDPG if it is not present. This would prevent breaking the API while allowing not to rely on MPI.

@megsano
Copy link

megsano commented Jun 28, 2020

Hello! I'm encountering a similar issue. I am using PPO1 and GAIL, so I need mpi.
I installed OpenMPI and ran pip install stable-baselines[mpi]. However running import stable_baselines in my Python interpreter leads to hanging. When I run import stable_baselines after uninstalling mpi4py, I am able to import successfully.

I am using TF 1.15 and Debian stretch. Any ideas for how to fix this would be appreciated. Thanks!

@AdamGleave
Copy link
Collaborator Author

Hello! I'm encountering a similar issue. I am using PPO1 and GAIL, so I need mpi.
I installed OpenMPI and ran pip install stable-baselines[mpi]. However running import stable_baselines in my Python interpreter leads to hanging. When I run import stable_baselines after uninstalling mpi4py, I am able to import successfully.

I am using TF 1.15 and Debian stretch. Any ideas for how to fix this would be appreciated. Thanks!

Does from mpi4py import MPI on its own work?

If that also causes an interpreter hang then the problem is unrelated to Stable Baselines, and you'd be better off asking on mpi4py or OpenMPI forums.

@megsano
Copy link

megsano commented Jul 2, 2020

Does from mpi4py import MPI on its own work?

This hangs as well. I'll ask on their forums, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants