[Bug Report] Getting "Environment [some ID] doesn't exist" when using custom async vector env. #222

sven1977 · 2022-12-19T16:52:09Z

Describe the bug

When running the below script (custom gymnasium.Env registered with an ID, then async-vectorized), I'm getting a gymnasium.error.NameNotFound: Environment my_env doesn't exist. error.

The full stacktrace is:

Process Worker<AsyncVectorEnv>-0:
Traceback (most recent call last):
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/async_vector_env.py", line 618, in _worker_shared_memory
    env = env_fn()
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/utils/misc.py", line 29, in __call__
    return self.fn()
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/__init__.py", line 51, in _make_env
    env = gym.envs.registration.make(
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/envs/registration.py", line 569, in make
    _check_version_exists(ns, name, version)
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/envs/registration.py", line 219, in _check_version_exists
    _check_name_exists(ns, name)
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/envs/registration.py", line 197, in _check_name_exists
    raise error.NameNotFound(
gymnasium.error.NameNotFound: Environment my_env doesn't exist. 
Process Worker<AsyncVectorEnv>-1:
Traceback (most recent call last):
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/async_vector_env.py", line 618, in _worker_shared_memory
    env = env_fn()
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/utils/misc.py", line 29, in __call__
    return self.fn()
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/__init__.py", line 51, in _make_env
    env = gym.envs.registration.make(
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/envs/registration.py", line 569, in make
    _check_version_exists(ns, name, version)
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/envs/registration.py", line 219, in _check_version_exists
    _check_name_exists(ns, name)
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/envs/registration.py", line 197, in _check_name_exists
    raise error.NameNotFound(
gymnasium.error.NameNotFound: Environment my_env doesn't exist. 
Traceback (most recent call last):
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1477, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/sven/Library/Application Support/JetBrains/PyCharmCE2020.3/scratches/scratch_215.py", line 25, in <module>
    env = gym.vector.make("my_env", num_envs=2, asynchronous=True)
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/__init__.py", line 73, in make
    return AsyncVectorEnv(env_fns) if asynchronous else SyncVectorEnv(env_fns)
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/async_vector_env.py", line 168, in __init__
    self._check_spaces()
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/async_vector_env.py", line 502, in _check_spaces
    results, successes = zip(*[pipe.recv() for pipe in self.parent_pipes])
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/site-packages/gymnasium/vector/async_vector_env.py", line 502, in <listcomp>
    results, successes = zip(*[pipe.recv() for pipe in self.parent_pipes])
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/Users/sven/opt/anaconda3/envs/ray/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

Code example

import gymnasium as gym
import numpy as np


class MyEnv(gym.Env):
    def __init__(self):
        self.action_space = gym.spaces.Discrete(2)
        self.observation_space = gym.spaces.Box(0, 100, (1,), dtype=np.float32)
        self.i = 0

    def reset(self, *, seed=None, options=None):
        self.i = 0
        return self._get_obs(), {}

    def step(self, action):
        self.i += 1
        return self._get_obs(), 1.0, False, self.i >= 5, {}

    def _get_obs(self):
        return np.array([self.i], dtype=np.float32)


if __name__ == "__main__":
    gym.register("my_env", MyEnv)
    env = gym.vector.make("my_env", num_envs=2, asynchronous=True)

System info

Mac OS (laptop)
python 3.8.13
gymnasium 0.26.3
gym 0.26.2 (not needed, but installed for Atari)

Additional context

No response

Checklist

I have checked that there is no similar issue in the repo

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2022-12-20T19:17:38Z

I have copied and pasted your code and don't get an issue.
Could your code be getting confused by gym and gymnasium

RedTachyon · 2022-12-21T00:43:55Z

Ooh, this one is spicy, I can actually reproduce it locally, and I realized that I lowkey had the same issue some months back, but didn't think about its wider implications.

Note: I'm not an expert on python multiprocessing, so details might be off, but I'm pretty sure this is the general idea of what's happening.
You define your environment in the main Python process, and the gym.registry instance gets updated. When you create the async vector env, the process gets spawned/forked (more on this later), which essentially creates a new interpreter and reruns some of the code. It seems that this doesn't include the update to the registry, so in the child process, the new environment doesn't get registered. So even though the main thread sees everything, each child process only sees the built-in envs, so they crash.

To potentially make things even spicier (and why @pseudo-rnd-thoughts couldn't replicate it) - this might depend on the operating system. I'm also getting the error on MacOS, but a quick test on colab seems to pass without a problem. This might be related to the start methods in multiprocessing. It seems that MacOS (and Windows) uses spawn by default, while Linux uses fork. I don't know what happens with forkserver. It's too late for me to dig into it right now, but the start methods can be switched via arguments to AsyncVectorEnv (sadly unavailable through the gym.vector.make API), so we can use this to check what works on different systems.

As for the solution, the vector API is undergoing a complete rewrite at the moment, so we'll definitely have to think about what to do. When I came across this issue in the past, I used a super ugly workaround of doing the imports/registration inside on the subprocesses. Maybe it would be viable to restrict async envs to a specific start method that behaves well? We'll have to think about it. At the very least the new API should allow directly choosing the start method.

The temporary workaround could be monkey-patching the gym.vector.make function to manually select the right start method, or accessing AsyncVectorEnv directly.

tl;dr I blame the GIL (quite possibly incorrectly, but shh)

RedTachyon · 2023-01-12T13:41:53Z

I can't work on the implementation right now, but I wanted to write down my thoughts on how this can be solved.

After some reading, turns out that global variables are properly inherited for fork, but not for spawn (see this random article: https://superfastpython.com/multiprocessing-inherit-global-variables-in-python/)

In principle we could restrict it to using fork (and maybe forkserver?), but that feels like a bit of a cop-out - and I know that at least in some cases, it does actually make a difference which one you choose (not just for performance, but also whether your code will even run properly).

The best "robust" option is probably to pass the entire env registry as an argument to the async environment worker, and then use those specifications instead of directly using gym.make(env_id). The question here would be the performance impact which I cannot estimate right now, but e.g. Atari likes to register like a thousand different envs, and all of that needs to be sent between the processes. Fortunately, each env spec should be relatively light-weight, and it's a one-time cost. Then again, it's a bunch of extra memory usage for each process, so we need to profile it.

If it does turn out to be a problem, we can also consider using shared memory to distribute a single copy of the registry between different workers. This should hopefully be straight-forward, as long as multiprocessing doesn't do anything weird.

pseudo-rnd-thoughts · 2023-01-12T14:42:11Z

Could we look to include this in v0.28 experimental vector implementation?

I think the idea of a shared memory object would be the best way of doing this.

pseudo-rnd-thoughts · 2023-08-11T23:18:25Z

@sven1977 or @RedTachyon I'm looking to include a fix for this in the next releases however I'm still unable to replicate the issue on my Macbook.
I expected the following code to raise this issue but it doesn't

def test_async_with_dynamically_registered_env():
    gym.register("TestEnv-v0", CartPoleEnv)

    gym.make_vec("TestEnv-v0", vectorization_mode="async")

    del gym.registry["TestEnv-v0"]

gonultasbu · 2023-11-05T02:04:00Z

I can replicate the issue on Windows 11 with the following error.

gymnasium.error.NameNotFound: Environment `my_env` doesn't exist

EDIT: asynchronous=False parameter solves the issue as implied by @RedTachyon .

pseudo-rnd-thoughts · 2023-11-06T15:23:43Z

Hi, we haven't been able to replicate the issue for our CI in order to solve this issue.
Are you able to produce a small script that can replicate the issue?

gonultasbu · 2023-11-06T15:28:36Z

The code example provided in the first post of the issue does replicate the issue for my case.

pseudo-rnd-thoughts · 2023-11-06T16:07:04Z

Strangely, my laptop now raises the error using the original code (which it did not as I commented).
I have made a PR that currently just adds a test to see if the CI will raise the error as expected so we can then experiment with testing

RedTachyon · 2023-12-03T20:45:19Z

Closing in favor of PR #810

sven1977 added the bug Something isn't working label Dec 19, 2022

pseudo-rnd-thoughts added a commit to pseudo-rnd-thoughts/Gymnasium that referenced this issue Nov 6, 2023

Initial commit that adds a test for issue Farama-Foundation#222

ba732ad

RedTachyon mentioned this issue Dec 3, 2023

Fix async vectorization for custom registered envs #810

Merged

4 tasks

RedTachyon closed this as completed Dec 3, 2023

maxhuettenrauch mentioned this issue Mar 13, 2024

Allow explicit setting of multiprocessing context for SubprocEnvWorker thu-ml/tianshou#1072

Merged

8 tasks

Dottellini mentioned this issue Jan 17, 2025

gymnasium.error.NameNotFound: Environment BreakoutNoFrameskip doesn't exist. vwxyzjn/cleanrl#478

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Getting "Environment [some ID] doesn't exist" when using custom async vector env. #222

[Bug Report] Getting "Environment [some ID] doesn't exist" when using custom async vector env. #222

sven1977 commented Dec 19, 2022 •

edited

Loading

pseudo-rnd-thoughts commented Dec 20, 2022

RedTachyon commented Dec 21, 2022 •

edited

Loading

RedTachyon commented Jan 12, 2023

pseudo-rnd-thoughts commented Jan 12, 2023

pseudo-rnd-thoughts commented Aug 11, 2023

gonultasbu commented Nov 5, 2023 •

edited

Loading

pseudo-rnd-thoughts commented Nov 6, 2023

gonultasbu commented Nov 6, 2023

pseudo-rnd-thoughts commented Nov 6, 2023

RedTachyon commented Dec 3, 2023

[Bug Report] Getting "Environment [some ID] doesn't exist" when using custom async vector env. #222

[Bug Report] Getting "Environment [some ID] doesn't exist" when using custom async vector env. #222

Comments

sven1977 commented Dec 19, 2022 • edited Loading

Describe the bug

Code example

System info

Additional context

Checklist

pseudo-rnd-thoughts commented Dec 20, 2022

RedTachyon commented Dec 21, 2022 • edited Loading

RedTachyon commented Jan 12, 2023

pseudo-rnd-thoughts commented Jan 12, 2023

pseudo-rnd-thoughts commented Aug 11, 2023

gonultasbu commented Nov 5, 2023 • edited Loading

pseudo-rnd-thoughts commented Nov 6, 2023

gonultasbu commented Nov 6, 2023

pseudo-rnd-thoughts commented Nov 6, 2023

RedTachyon commented Dec 3, 2023

sven1977 commented Dec 19, 2022 •

edited

Loading

RedTachyon commented Dec 21, 2022 •

edited

Loading

gonultasbu commented Nov 5, 2023 •

edited

Loading