Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency gymnasium to v1 #14

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Oct 8, 2024

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
gymnasium ==0.29.1 -> ==1.0.0 age adoption passing confidence

Release Notes

Farama-Foundation/Gymnasium (gymnasium)

v1.0.0

Compare Source

v1.0.0 release notes

Over the last few years, the volunteer team behind Gym and Gymnasium has worked to fix bugs, improve the documentation, add new features, and change the API where appropriate so that the benefits outweigh the costs. This is the complete release of v1.0.0, which will be the end of this road to change the project's central API (Env, Space, VectorEnv). In addition, the release has included over 200 PRs since 0.29.1, with many bug fixes, new features, and improved documentation. So, thank you to all the volunteers for their hard work that has made this possible. For the rest of these release notes, we include sections of core API changes, ending with the additional new features, bug fixes, deprecation and documentation changes included.

Finally, we have published a paper on Gymnasium, discussing its overall design decisions and more at https://arxiv.org/abs/2407.17032, which can be cited using the following:

@​misc{towers2024gymnasium,
      title={Gymnasium: A Standard Interface for Reinforcement Learning Environments}, 
      author={Mark Towers and Ariel Kwiatkowski and Jordan Terry and John U. Balis and Gianluca De Cola and Tristan Deleu and Manuel Goulão and Andreas Kallinteris and Markus Krimmel and Arjun KG and Rodrigo Perez-Vicente and Andrea Pierré and Sander Schulhoff and Jun Jet Tai and Hannah Tan and Omar G. Younis},
      year={2024},
      eprint={2407.17032},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2407.17032}, 
}

Removing The Plugin System

Within Gym v0.23+ and Gymnasium v0.26 to v0.29, an undocumented feature for registering external environments behind the scenes has been removed. For users of Atari (ALE), Minigrid or HighwayEnv, then users could previously use the following code:

import gymnasium as gym

env = gym.make("ALE/Pong-v5")

Despite Atari never being imported (i.e., import ale_py), users can still create an Atari environment. This feature has been removed in v1.0.0, which will require users to update to

import gymnasium as gym
import ale_py

gym.register_envs(ale_py)  # optional, helpful for IDEs or pre-commit

env = gym.make("ALE/Pong-v5")

Alternatively, users can use the following structure, module_name:env_id, ' so that the module is imported first before the environment is created. e.g., ale_py:ALE/Pong-v5`.

import gymnasium as gym

env = gym.make("ale_py:ALE/Pong-v5")

To help users with IDEs (e.g., VSCode, PyCharm), when importing modules to register environments (e.g., import ale_py) this can cause the IDE (and pre-commit isort / black / flake8) to believe that the import is pointless and should be removed. Therefore, we have introduced gymnasium.register_envs as a no-op function (the function literally does nothing) to make the IDE believe that something is happening and the import statement is required.

Vector Environments

To increase the sample speed of an environment, vectorizing is one of the easiest ways to sample multiple instances of the same environment simultaneously. Gym and Gymnasium provide the VectorEnv as a base class for this, but one of its issues has been that it inherited Env. This can cause particular issues with type checking (the return type of step is different for Env and VectorEnv), testing the environment type (isinstance(env, Env) can be true for vector environments despite the two acting differently) and finally wrappers (some Gym and Gymnasium wrappers supported Vector environments, but there are no clear or consistent API for determining which do or don't). Therefore, we have separated out Env and VectorEnv to not inherit from each other.

In implementing the new separate VectorEnv class, we have tried to minimize the difference between code using Env and VectorEnv along with making it more generic in places. The class contains the same attributes and methods as Env in addition to the attributes num_envs: int, single_action_space: gymnasium.Space and single_observation_space: gymnasium.Space. Further, we have removed several functions from VectorEnv that are not needed for all vector implementations: step_async, step_wait, reset_async, reset_wait, call_async and call_wait. This change now allows users to write their own custom vector environments, v1.0.0 includes an example vector cartpole environment that runs thousands of times faster written solely with NumPy than using Gymnasium's Sync vector environment.

To allow users to create vectorized environments easily, we provide gymnasium.make_vec as a vectorized equivalent of gymnasium.make. As there are multiple different vectorization options ("sync", "async", and a custom class referred to as "vector_entry_point"), the argument vectorization_mode selects how the environment is vectorized. This defaults to None such that if the environment has a vector entry point for a custom vector environment implementation, this will be utilized first (currently, Cartpole is the only environment with a vector entry point built into Gymnasium). Otherwise, the synchronous vectorizer is used (previously, the Gym and Gymnasium vector.make used asynchronous vectorizer as default). For more information, see the function docstring. We are excited to see other projects utilize this option to make creating their environments easier.

env = gym.make("CartPole-v1")
env = gym.wrappers.ClipReward(env, min_reward=-1, max_reward=3)

envs = gym.make_vec("CartPole-v1", num_envs=3)
envs = gym.wrappers.vector.ClipReward(envs, min_reward=-1, max_reward=3)

Due to this split of Env and VectorEnv, there are now Env only wrappers and VectorEnv only wrappers in gymnasium.wrappers and gymnasium.wrappers.vector respectively. Furthermore, we updated the names of the base vector wrappers from VectorEnvWrapper to VectorWrapper and added VectorObservationWrapper, VectorRewardWrapper and VectorActionWrapper classes. See the vector wrapper page for new information.

To increase the efficiency of vector environments, autoreset is a common feature that allows sub-environments to reset without requiring all sub-environments to finish before resetting them all. Previously in Gym and Gymnasium, auto-resetting was done on the same step as the environment episode ends, such that the final observation and info would be stored in the step's info, i.e., info["final_observation"] and info[“final_info”] and standard obs and info containing the sub-environment's reset observation and info. Thus, accurately sampling observations from a vector environment required the following code (note the need to extract the infos["next_obs"][j] if the sub-environment was terminated or truncated). Additionally, for on-policy algorithms that use rollout would require an additional forward pass to compute the correct next observation (this is often not done as an optimization assuming that environments only terminate, not truncate).

replay_buffer = []
obs, _ = envs.reset()
for _ in range(total_timesteps):
    next_obs, rewards, terminations, truncations, infos = envs.step(envs.action_space.sample())

    for j in range(envs.num_envs):
        if not (terminations[j] or truncations[j]):
            replay_buffer.append((
                obs[j], rewards[j], terminations[j], truncations[j], next_obs[j]
            ))
        else:
            replay_buffer.append((
                obs[j], rewards[j], terminations[j], truncations[j], infos["next_obs"][j]
            ))

    obs = next_obs

However, over time, the development team has recognized the inefficiency of this approach (primarily due to the extensive use of a Python dictionary) and the annoyance of having to extract the final observation to train agents correctly, for example. Therefore, in v1.0.0, we are modifying autoreset to align with specialized vector-only projects like EnvPool and SampleFactory where the sub-environment's doesn't reset until the next step. As a result, the following changes are required when sampling:

replay_buffer = []
obs, _ = envs.reset()
autoreset = np.zeros(envs.num_envs)
for _ in range(total_timesteps):
    next_obs, rewards, terminations, truncations, _ = envs.step(envs.action_space.sample())

    for j in range(envs.num_envs):
        if not autoreset[j]:
            replay_buffer.append((
                obs[j], rewards[j], terminations[j], truncations[j], next_obs[j]
            ))

    obs = next_obs
    autoreset = np.logical_or(terminations, truncations)

For on-policy rollout, to account for the autoreset requires masking the error for the first observation in a new episode (done[t+1]) to prevent computing the error between the last and first observations of episodes.

Finally, we have improved AsyncVectorEnv.set_attr and SyncVectorEnv.set_attr functions to use the Wrapper.set_wrapper_attr to allow users to set variables anywhere in the environment stack if it already exists. Previously, this was not possible and users could only modify the variable in the "top" wrapper on the environment stack, importantly not the actual environment itself.

Wrappers

Previously, some wrappers could support both environment and vector environments, however, this was not standardized, and was unclear which wrapper did and didn't support vector environments. For v1.0.0, with separating Env and VectorEnv to no longer inherit from each other (read more in the vector section), the wrappers in gymnasium.wrappers will only support standard environments and wrappers in gymnasium.wrappers.vector contains the provided specialized vector wrappers (most but not all wrappers are supported, please raise a feature request if you require it).

In v0.29, we deprecated the Wrapper.__getattr__ function to be replaced by Wrapper.get_wrapper_attr, providing access to variables anywhere in the environment stack. In v1.0.0, we have added Wrapper.set_wrapper_attr as an equivalent function for setting a variable anywhere in the environment stack if it already exists; otherwise the variable is assigned to the top wrapper.

Most significantly, we have removed, renamed, and added several wrappers listed below.

  • Removed wrappers
    • monitoring.VideoRecorder - The replacement wrapper is RecordVideo
    • StepAPICompatibility - We expect all Gymnasium environments to use the terminated / truncated step API, therefore, users shouldn't need the StepAPICompatibility wrapper. Shimmy includes a compatibility environment to convert gym-api environments for gymnasium.
  • Renamed wrappers (We wished to make wrappers consistent in naming. Therefore, we have removed "Wrapper" from all wrappers and included "Observation", "Action" and "Reward" within wrapper names where appropriate)
    • AutoResetWrapper -> Autoreset
    • FrameStack -> FrameStackObservation
    • PixelObservationWrapper -> AddRenderObservation
  • Moved wrappers (All vector wrappers are in gymnasium.wrappers.vector)
    • VectorListInfo -> vector.DictInfoToList
  • Added wrappers
    • DelayObservation - Adds a delay to the next observation and reward
    • DtypeObservation - Modifies the dtype of an environment's observation space
    • MaxAndSkipObservation - Will skip n observations and will max over the last 2 observations, inspired by the Atari environment heuristic for other environments
    • StickyAction - Random repeats actions with a probability for a step returning the final observation and sum of rewards over steps. Inspired by Atari environment heuristics
    • JaxToNumpy - Converts a Jax-based environment to use Numpy-based input and output data for reset, step, etc
    • JaxToTorch - Converts a Jax-based environment to use PyTorch-based input and output data for reset, step, etc
    • NumpyToTorch - Converts a Numpy-based environment to use PyTorch-based input and output data for reset, step, etc

For all wrappers, we have added example code documentation and a changelog to help future researchers understand any changes made. See the following page for an example.

Functional Environments

One of the substantial advantages of Gymnasium's Env is it generally requires minimal information about the underlying environment specifications; however, this can make applying such environments to planning, search algorithms, and theoretical investigations more difficult. We are proposing FuncEnv as an alternative definition to Env which is closer to a Markov Decision Process definition, exposing more functions to the user, including the observation, reward, and termination functions along with the environment's raw state as a single object.

from typing import Any
import gymnasium as gym
from gymnasium.functional import StateType, ObsType, ActType, RewardType, TerminalType, Params

class ExampleFuncEnv(gym.functional.FuncEnv):
  def initial(self, rng: Any, params: Params | None = None) -> StateType:
    ...
  def transition(self, state: StateType, action: ActType, rng: Any, params: Params | None = None) -> StateType:
    ...
  def observation(self, state: StateType, rng: Any, params: Params | None = None) -> ObsType:
    ...
  def reward(
      self, state: StateType, action: ActType, next_state: StateType, rng: Any, params: Params | None = None
  ) -> RewardType:
    ...
  def terminal(self, state: StateType, rng: Any, params: Params | None = None) -> TerminalType:
    ...

FuncEnv requires that initial and transition functions return a new state given its inputs as a partial implementation of Env.step and Env.reset. As a result, users can sample (and save) the next state for a range of inputs to use with planning, searching, etc. Given a state, observation, reward, and terminal provide users explicit definitions to understand how each can affect the environment's output.

Collecting Seeding Values

It was possible to seed with both environments and spaces with None to use a random initial seed value, however it wouldn't be possible to know what these initial seed values were. We have addressed this for Space.seed and reset.seed in https://github.com/Farama-Foundation/Gymnasium/pull/1033 and https://github.com/Farama-Foundation/Gymnasium/pull/889. Additionally, for Space.seed, we have changed the return type to be specialized for each space such that the following code will work for all spaces.

seeded_values = space.seed(None)
initial_samples = [space.sample() for _ in range(10)]

reseed_values = space.seed(seeded_values)
reseed_samples = [space.sample() for _ in range(10)]

assert seeded_values == reseed_values
assert initial_samples == reseed_samples

Additionally, for environments, we have added a new np_random_seed attribute that will store the most recent np_random seed value from reset(seed=seed).

Environment Version Changes
  • It was discovered recently that the MuJoCo-based Pusher was not compatible with mujoco>= 3 as the model's density for the block that the agent had to push was lighter than air. This obviously began to cause issues for users with mujoco>= 3 and Pusher. Therefore, we are disabled the v4 environment with mujoco>= 3 and updated to the model in MuJoCo v5 that produces more expected behavior like v4 and mujoco< 3 (https://github.com/Farama-Foundation/Gymnasium/pull/1019).

  • New v5 MuJoCo environments as a follow-up to v4 environments added two years ago, fixing consistencies, adding new features and updating the documentation (https://github.com/Farama-Foundation/Gymnasium/pull/572). Additionally, we have decided to mark the mujoco-py based (v2 and v3) environments as deprecated and plan to remove them from Gymnasium in future (https://github.com/Farama-Foundation/Gymnasium/pull/926).

  • Lunar Lander version increased from v2 to v3 due to two bug fixes. The first fixes the determinism of the environment such that the world object was not completely destroyed on reset causing non-determinism in particular cases (https://github.com/Farama-Foundation/Gymnasium/pull/979). Second, the wind generation (by default turned off) was not randomly generated by each reset, therefore, we have updated this to gain statistical independence between episodes (https://github.com/Farama-Foundation/Gymnasium/pull/959).

  • CarRacing version increased from v2 to v3 to change how the environment ends such that when the agent completes the track then the environment will terminate not truncate.

  • We have remove pip install "gymnasium[accept-rom-license]" as ale-py>=0.9 now comes packaged with the roms meaning that users don't need to install the atari roms separately with autoroms.

Additional Bug Fixes

Additional new features

Deprecation

Documentation changes

Full Changelog: Farama-Foundation/Gymnasium@v0.29.1...v1.0.0


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Copy link
Contributor Author

renovate bot commented Oct 8, 2024

⚠️ Artifact update problem

Renovate failed to update an artifact related to this branch. You probably do not want to merge this PR as-is.

♻ Renovate will retry this branch, including artifacts, only when one of the following happens:

  • any of the package files in this branch needs updating, or
  • the branch becomes conflicted, or
  • you click the rebase/retry checkbox if found above, or
  • you rename this PR's title to start with "rebase!" to trigger it manually

The artifact failure details are included below:

File name: uv.lock
Command failed: uv lock --upgrade-package gymnasium
Using CPython 3.11.10
  × No solution found when resolving dependencies for split
  │ (python_full_version == '3.11.*' and platform_system == 'Darwin'):
  ╰─▶ Because stable-baselines3==2.3.2 depends on gymnasium>=0.28.1,<0.30
      and your project depends on gymnasium==1.0.0, we can conclude that your
      project and stable-baselines3==2.3.2 are incompatible.
      And because your project depends on stable-baselines3==2.3.2, we can
      conclude that your project's requirements are unsatisfiable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants