Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different values for is_done compared to Python library after certain amount of steps #33

Open
GeckoEidechse opened this issue Sep 9, 2020 · 1 comment

Comments

@GeckoEidechse
Copy link

I was working with the mountain car environment and I noticed that unlike in my Python code, the Rust version would always end an episode after 200 steps due to the OpenAI gym library indicating the episode as 'done'.

Interestingly enough,

import gym
gym.make("MountainCar-v0")._max_episode_steps

returns 200, meaning the Rust library is correct in returning is_done as true after 200 steps. However I'm not observing the same result in the original Python library. Considering that this Rust library is supposed to be a frontend to the Python library, I'd argue it should mimic its results, even if incorrect.

Minimum working examples showing the difference:

Python

import gym
env = gym.make("MountainCar-v0")
env.env.reset()
for i in range(300):
    observation, reward, is_done, info = env.env.step(1)
    if is_done:
        print("is_done == true at step:", i)
        break

Rust

extern crate gym;
fn main() {
    let gym = gym::GymClient::default();
    let env = gym.make("MountainCar-v0");
    env.reset();
    for i in 0..300 {
        let gym::State {observation, reward, is_done} = env.step(&gym::Action::DISCRETE(1)).unwrap();
        if is_done {
            println!("is_done == true at step: {}", i);
            break;
        }       
    }
}
@hemaolong
Copy link

I encount the same problem, it would be nice to have the same behavior with gym-python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants