Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allowing integer reward in trajectories #783

Open
PavelCz opened this issue Sep 12, 2023 · 0 comments
Open

Consider allowing integer reward in trajectories #783

PavelCz opened this issue Sep 12, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@PavelCz
Copy link
Member

PavelCz commented Sep 12, 2023

Problem

Due to this validation environments returning integer rewards will throw an exception, e.g. when I try to collect rollouts from an expert policy.

This seems a bit overzealous. Here are some things to consider:

  • gym did somewhat specify float (see), but it doesn't seem like this was enforced anywhere.
    • e.g. gym-minigrid can return 0 (as an integer) in some situations
  • gymnasium now specifies typing.SupportsFloat (see), which allows integers (and envs such as MiniGrid still use ints).
  • sb3 vec_envs are documented as returning float (see)
    • However, this is not enforced in either the VecEnvWrapper or make_vec_env, AFAICT. That means I can wrap a valid gymnasium env in a VecEnv and my venv will return integers.
  • The imitation doc does say the dtype of a reward in TrajectoryWithRew is float (see).

Solution

I suggest changing the validation to np.number or typing.SupportsFloat (exists since python3.5) or np.floating or np.integer and updating the doc appropriately. Granted, this is all a bit awkward, since there is no exact equivalent of typing.SupportsFloat in numpy.
If this is a desired change, I could open a PR on this.

Possible alternative solutions

If you want to support gym envs that return integer rewards but keep using floats internally, maybe we could convert integers to float at some point in the pipeline. This could for example be done in the RolloutInfoWrapper.

Currently, my workaround is a wrapper that converts all rewards to floats. Obviously not a big deal, but seems kind of unnecessary.

@PavelCz PavelCz added the enhancement New feature or request label Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant