Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.29.1
->==1.0.0
Release Notes
Farama-Foundation/Gymnasium (gymnasium)
v1.0.0
Compare Source
v1.0.0 release notes
Over the last few years, the volunteer team behind Gym and Gymnasium has worked to fix bugs, improve the documentation, add new features, and change the API where appropriate so that the benefits outweigh the costs. This is the complete release of
v1.0.0
, which will be the end of this road to change the project's central API (Env
,Space
,VectorEnv
). In addition, the release has included over 200 PRs since0.29.1
, with many bug fixes, new features, and improved documentation. So, thank you to all the volunteers for their hard work that has made this possible. For the rest of these release notes, we include sections of core API changes, ending with the additional new features, bug fixes, deprecation and documentation changes included.Finally, we have published a paper on Gymnasium, discussing its overall design decisions and more at https://arxiv.org/abs/2407.17032, which can be cited using the following:
Removing The Plugin System
Within Gym v0.23+ and Gymnasium v0.26 to v0.29, an undocumented feature for registering external environments behind the scenes has been removed. For users of Atari (ALE), Minigrid or HighwayEnv, then users could previously use the following code:
Despite Atari never being imported (i.e.,
import ale_py
), users can still create an Atari environment. This feature has been removed inv1.0.0
, which will require users to update toAlternatively, users can use the following structure,
module_name:env_id, ' so that the module is imported first before the environment is created. e.g.,
ale_py:ALE/Pong-v5`.To help users with IDEs (e.g., VSCode, PyCharm), when importing modules to register environments (e.g.,
import ale_py
) this can cause the IDE (and pre-commit isort / black / flake8) to believe that the import is pointless and should be removed. Therefore, we have introducedgymnasium.register_envs
as a no-op function (the function literally does nothing) to make the IDE believe that something is happening and the import statement is required.Vector Environments
To increase the sample speed of an environment, vectorizing is one of the easiest ways to sample multiple instances of the same environment simultaneously. Gym and Gymnasium provide the
VectorEnv
as a base class for this, but one of its issues has been that it inheritedEnv
. This can cause particular issues with type checking (the return type ofstep
is different forEnv
andVectorEnv
), testing the environment type (isinstance(env, Env)
can be true for vector environments despite the two acting differently) and finally wrappers (some Gym and Gymnasium wrappers supported Vector environments, but there are no clear or consistent API for determining which do or don't). Therefore, we have separated outEnv
andVectorEnv
to not inherit from each other.In implementing the new separate
VectorEnv
class, we have tried to minimize the difference between code usingEnv
andVectorEnv
along with making it more generic in places. The class contains the same attributes and methods asEnv
in addition to the attributesnum_envs: int
,single_action_space: gymnasium.Space
andsingle_observation_space: gymnasium.Space
. Further, we have removed several functions fromVectorEnv
that are not needed for all vector implementations:step_async
,step_wait
,reset_async
,reset_wait
,call_async
andcall_wait
. This change now allows users to write their own custom vector environments, v1.0.0 includes an example vector cartpole environment that runs thousands of times faster written solely with NumPy than using Gymnasium's Sync vector environment.To allow users to create vectorized environments easily, we provide
gymnasium.make_vec
as a vectorized equivalent ofgymnasium.make
. As there are multiple different vectorization options ("sync", "async", and a custom class referred to as "vector_entry_point"), the argumentvectorization_mode
selects how the environment is vectorized. This defaults toNone
such that if the environment has a vector entry point for a custom vector environment implementation, this will be utilized first (currently, Cartpole is the only environment with a vector entry point built into Gymnasium). Otherwise, the synchronous vectorizer is used (previously, the Gym and Gymnasiumvector.make
used asynchronous vectorizer as default). For more information, see the function docstring. We are excited to see other projects utilize this option to make creating their environments easier.Due to this split of
Env
andVectorEnv
, there are nowEnv
only wrappers andVectorEnv
only wrappers ingymnasium.wrappers
andgymnasium.wrappers.vector
respectively. Furthermore, we updated the names of the base vector wrappers fromVectorEnvWrapper
toVectorWrapper
and addedVectorObservationWrapper
,VectorRewardWrapper
andVectorActionWrapper
classes. See the vector wrapper page for new information.To increase the efficiency of vector environments, autoreset is a common feature that allows sub-environments to reset without requiring all sub-environments to finish before resetting them all. Previously in Gym and Gymnasium, auto-resetting was done on the same step as the environment episode ends, such that the final observation and info would be stored in the step's info, i.e.,
info["final_observation"]
andinfo[“final_info”]
and standard obs and info containing the sub-environment's reset observation and info. Thus, accurately sampling observations from a vector environment required the following code (note the need to extract theinfos["next_obs"][j]
if the sub-environment was terminated or truncated). Additionally, for on-policy algorithms that use rollout would require an additional forward pass to compute the correct next observation (this is often not done as an optimization assuming that environments only terminate, not truncate).However, over time, the development team has recognized the inefficiency of this approach (primarily due to the extensive use of a Python dictionary) and the annoyance of having to extract the final observation to train agents correctly, for example. Therefore, in v1.0.0, we are modifying autoreset to align with specialized vector-only projects like EnvPool and SampleFactory where the sub-environment's doesn't reset until the next step. As a result, the following changes are required when sampling:
For on-policy rollout, to account for the autoreset requires masking the error for the first observation in a new episode (
done[t+1]
) to prevent computing the error between the last and first observations of episodes.Finally, we have improved
AsyncVectorEnv.set_attr
andSyncVectorEnv.set_attr
functions to use theWrapper.set_wrapper_attr
to allow users to set variables anywhere in the environment stack if it already exists. Previously, this was not possible and users could only modify the variable in the "top" wrapper on the environment stack, importantly not the actual environment itself.Wrappers
Previously, some wrappers could support both environment and vector environments, however, this was not standardized, and was unclear which wrapper did and didn't support vector environments. For v1.0.0, with separating
Env
andVectorEnv
to no longer inherit from each other (read more in the vector section), the wrappers ingymnasium.wrappers
will only support standard environments and wrappers ingymnasium.wrappers.vector
contains the provided specialized vector wrappers (most but not all wrappers are supported, please raise a feature request if you require it).In v0.29, we deprecated the
Wrapper.__getattr__
function to be replaced byWrapper.get_wrapper_attr
, providing access to variables anywhere in the environment stack. In v1.0.0, we have addedWrapper.set_wrapper_attr
as an equivalent function for setting a variable anywhere in the environment stack if it already exists; otherwise the variable is assigned to the top wrapper.Most significantly, we have removed, renamed, and added several wrappers listed below.
monitoring.VideoRecorder
- The replacement wrapper isRecordVideo
StepAPICompatibility
- We expect all Gymnasium environments to use the terminated / truncated step API, therefore, users shouldn't need theStepAPICompatibility
wrapper. Shimmy includes a compatibility environment to convert gym-api environments for gymnasium.AutoResetWrapper
->Autoreset
FrameStack
->FrameStackObservation
PixelObservationWrapper
->AddRenderObservation
gymnasium.wrappers.vector
)VectorListInfo
->vector.DictInfoToList
DelayObservation
- Adds a delay to the next observation and rewardDtypeObservation
- Modifies the dtype of an environment's observation spaceMaxAndSkipObservation
- Will skipn
observations and will max over the last 2 observations, inspired by the Atari environment heuristic for other environmentsStickyAction
- Random repeats actions with a probability for a step returning the final observation and sum of rewards over steps. Inspired by Atari environment heuristicsJaxToNumpy
- Converts a Jax-based environment to use Numpy-based input and output data forreset
,step
, etcJaxToTorch
- Converts a Jax-based environment to use PyTorch-based input and output data forreset
,step
, etcNumpyToTorch
- Converts a Numpy-based environment to use PyTorch-based input and output data forreset
,step
, etcFor all wrappers, we have added example code documentation and a changelog to help future researchers understand any changes made. See the following page for an example.
Functional Environments
One of the substantial advantages of Gymnasium's
Env
is it generally requires minimal information about the underlying environment specifications; however, this can make applying such environments to planning, search algorithms, and theoretical investigations more difficult. We are proposingFuncEnv
as an alternative definition toEnv
which is closer to a Markov Decision Process definition, exposing more functions to the user, including the observation, reward, and termination functions along with the environment's raw state as a single object.FuncEnv
requires thatinitial
andtransition
functions return a new state given its inputs as a partial implementation ofEnv.step
andEnv.reset
. As a result, users can sample (and save) the next state for a range of inputs to use with planning, searching, etc. Given a state,observation
,reward
, andterminal
provide users explicit definitions to understand how each can affect the environment's output.Collecting Seeding Values
It was possible to seed with both environments and spaces with
None
to use a random initial seed value, however it wouldn't be possible to know what these initial seed values were. We have addressed this forSpace.seed
andreset.seed
in https://github.com/Farama-Foundation/Gymnasium/pull/1033 and https://github.com/Farama-Foundation/Gymnasium/pull/889. Additionally, forSpace.seed
, we have changed the return type to be specialized for each space such that the following code will work for all spaces.Additionally, for environments, we have added a new
np_random_seed
attribute that will store the most recentnp_random
seed value fromreset(seed=seed)
.Environment Version Changes
It was discovered recently that the MuJoCo-based Pusher was not compatible with
mujoco>= 3
as the model's density for the block that the agent had to push was lighter than air. This obviously began to cause issues for users withmujoco>= 3
and Pusher. Therefore, we are disabled thev4
environment withmujoco>= 3
and updated to the model in MuJoCov5
that produces more expected behavior likev4
andmujoco< 3
(https://github.com/Farama-Foundation/Gymnasium/pull/1019).New v5 MuJoCo environments as a follow-up to v4 environments added two years ago, fixing consistencies, adding new features and updating the documentation (https://github.com/Farama-Foundation/Gymnasium/pull/572). Additionally, we have decided to mark the mujoco-py based (v2 and v3) environments as deprecated and plan to remove them from Gymnasium in future (https://github.com/Farama-Foundation/Gymnasium/pull/926).
Lunar Lander version increased from v2 to v3 due to two bug fixes. The first fixes the determinism of the environment such that the world object was not completely destroyed on reset causing non-determinism in particular cases (https://github.com/Farama-Foundation/Gymnasium/pull/979). Second, the wind generation (by default turned off) was not randomly generated by each reset, therefore, we have updated this to gain statistical independence between episodes (https://github.com/Farama-Foundation/Gymnasium/pull/959).
CarRacing version increased from v2 to v3 to change how the environment ends such that when the agent completes the track then the environment will terminate not truncate.
We have remove
pip install "gymnasium[accept-rom-license]"
asale-py>=0.9
now comes packaged with the roms meaning that users don't need to install the atari roms separately withautoroms
.Additional Bug Fixes
spaces.Box
would allow low and high values outside the dtype's range, which could result in some very strange edge cases that were very difficult to detect by @pseudo-rnd-thoughts (https://github.com/Farama-Foundation/Gymnasium/pull/774)gymnasium[mujoco-py]
due tocython==3
issues by @pseudo-rnd-thoughts (https://github.com/Farama-Foundation/Gymnasium/pull/616)register(kwargs)
from**kwargs
tokwargs: dict | None = None
by @younik (https://github.com/Farama-Foundation/Gymnasium/pull/788)AsyncVectorEnv
for custom environments by @RedTachyon (https://github.com/Farama-Foundation/Gymnasium/pull/810)mujoco-py
import error for v4+ MuJoCo environments by @MischaPanchhttps://github.com/Farama-Foundation/Gymnasium/pull/934/934)
Tuple
andDict
spaces (https://github.com/Farama-Foundation/Gymnasium/pull/941)Multidiscrete.from_jsonable
on windows (https://github.com/Farama-Foundation/Gymnasium/pull/932)play
rendering normalization (https://github.com/Farama-Foundation/Gymnasium/pull/956)to_torch
conversion by @mantasu (https://github.com/Farama-Foundation/Gymnasium/pull/1107)Additional new features
AsyncVectorEnv
by @pseudo-rnd-thoughts in https://github.com/Farama-Foundation/Gymnasium/pull/1119NamedTuples
inJaxToNumpy
,JaxToTorch
andNumpyToTorch
by @RogerJL (https://github.com/Farama-Foundation/Gymnasium/pull/789) and @pseudo-rnd-thoughts (https://github.com/Farama-Foundation/Gymnasium/pull/811)padding_type
parameter toFrameSkipObservation
to select the padding observation by @jamartinh (https://github.com/Farama-Foundation/Gymnasium/pull/830)check_environments_match
by @Kallinteris-Andreas (https://github.com/Farama-Foundation/Gymnasium/pull/748)OneOf
space that provides exclusive unions of spaces by @RedTachyon and @pseudo-rnd-thoughts (https://github.com/Farama-Foundation/Gymnasium/pull/812)Dict.sample
to use standard Python dicts rather thanOrderedDict
due to dropping Python 3.7 support by @pseudo-rnd-thoughts (https://github.com/Farama-Foundation/Gymnasium/pull/977)wrappers.vector.HumanRendering
and remove human rendering fromCartPoleVectorEnv
by @pseudo-rnd-thoughts and @TimSchneider42 (https://github.com/Farama-Foundation/Gymnasium/pull/1013)sutton_barto_reward
argument forCartPole
that changes the reward function to not return 1 on terminating states by @Kallinteris-Andreas (https://github.com/Farama-Foundation/Gymnasium/pull/958)visual_options
rendering argument for MuJoCo environments by @Kallinteris-Andreas (https://github.com/Farama-Foundation/Gymnasium/pull/965)exact
argument toutlis.env_checker.data_equivilance
by @Kallinteris-Andreas (https://github.com/Farama-Foundation/Gymnasium/pull/924)wrapper.NormalizeObservation
observation space and change observation tofloat32
by @pseudo-rnd-thoughts (https://github.com/Farama-Foundation/Gymnasium/pull/978)env.spec
if kwarg is unpickleable by @pseudo-rnd-thoughts (https://github.com/Farama-Foundation/Gymnasium/pull/982)is_slippery
option for cliffwalking environment by @CloseChoice (https://github.com/Farama-Foundation/Gymnasium/pull/1087)RescaleAction
andRescaleObservation
to supportnp.inf
bounds by @TimSchneider42 (https://github.com/Farama-Foundation/Gymnasium/pull/1095)env.reset(seed=42); env.reset()
by @qgallouedec (https://github.com/Farama-Foundation/Gymnasium/pull/1086)BaseMujocoEnv
class by @Kallinteris-Andreas (https://github.com/Farama-Foundation/Gymnasium/pull/1075)Deprecation
gymnasium.envs.mujoco
by @Kallinteris-Andreas (https://github.com/Farama-Foundation/Gymnasium/pull/827)Documentation changes
Gymnasium/MuJoCo/Ant-v5
framework by @Kallinteris-Andreas (https://github.com/Farama-Foundation/Gymnasium/pull/838)__init__
andreset
arguments by @pseudo-rnd-thoughts (https://github.com/Farama-Foundation/Gymnasium/pull/898)Full Changelog: Farama-Foundation/Gymnasium@v0.29.1...v1.0.0
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.