Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can be modified the max steps parameter in CartPole env? #397

Closed
fede72bari opened this issue Mar 20, 2023 · 5 comments
Closed

How can be modified the max steps parameter in CartPole env? #397

fede72bari opened this issue Mar 20, 2023 · 5 comments
Labels
question Further information is requested

Comments

@fede72bari
Copy link

Question

I need to extend the max steps parameter of the CartPole environment. I looked around and found some proposals for Gym rather than Gymnasium such as something similar to this:

env = gym.make("CartPole-v0")
env._max_episode_steps = 500

found here openai/gym#463

but in my case seems to not work:

env = suite_gym.load('CartPole-v1')
env = tf_py_environment.TFPyEnvironment(env)
env._max_episode_steps = 10000

Is there any other way? Thank you.

@fede72bari fede72bari added the question Further information is requested label Mar 20, 2023
@Kallinteris-Andreas
Copy link
Collaborator

@fede72bari
Copy link
Author

fede72bari commented Mar 20, 2023

thank you @Kallinteris-Andreas

I tried to run your suggested code as following


env = suite_gym.load('CartPole-v1')
env = TimeLimit(env, max_episode_steps=10000)
env = tf_py_environment.TFPyEnvironment(env)

but this arises the following error that is beyond my comprehension skills, but probably conflicts with the following TensorFlow instruction tf_py_environment.TFPyEnvironment


87.6s | 71 | ---------------------------------------------------------------------------
-- | -- | --
87.6s | 72 | Exception encountered at "In [5]":
87.6s | 73 | ---------------------------------------------------------------------------
87.6s | 74 | TypeError                                 Traceback (most recent call last)
87.6s | 75 | /tmp/ipykernel_19/134893757.py in <module>
87.6s | 76 | 1 env = suite_gym.load('CartPole-v1')
87.6s | 77 | 2 env = TimeLimit(env, max_episode_steps=10000)
87.6s | 78 | ----> 3 env = tf_py_environment.TFPyEnvironment(env)
87.6s | 79 | 4
87.6s | 80 | 5
87.6s | 81 |  
87.6s | 82 | /opt/conda/lib/python3.7/site-packages/gin/config.py in gin_wrapper(*args, **kwargs)
87.6s | 83 | 1603       scope_info = " in scope '{}'".format(scope_str) if scope_str else ''
87.6s | 84 | 1604       err_str = err_str.format(name, fn_or_cls, scope_info)
87.6s | 85 | -> 1605       utils.augment_exception_message_and_reraise(e, err_str)
87.6s | 86 | 1606
87.6s | 87 | 1607   return gin_wrapper
87.6s | 88 |  
87.6s | 89 | /opt/conda/lib/python3.7/site-packages/gin/utils.py in augment_exception_message_and_reraise(exception, message)
87.6s | 90 | 39   proxy = ExceptionProxy()
87.6s | 91 | 40   ExceptionProxy.__qualname__ = type(exception).__qualname__
87.6s | 92 | ---> 41   raise proxy.with_traceback(exception.__traceback__) from None
87.6s | 93 | 42
87.6s | 94 | 43
87.6s | 95 |  
87.6s | 96 | /opt/conda/lib/python3.7/site-packages/gin/config.py in gin_wrapper(*args, **kwargs)
87.6s | 97 | 1580
87.6s | 98 | 1581     try:
87.6s | 99 | -> 1582       return fn(*new_args, **new_kwargs)
87.6s | 100 | 1583     except Exception as e:  # pylint: disable=broad-except
87.6s | 101 | 1584       err_str = ''
87.6s | 102 |  
87.6s | 103 | /opt/conda/lib/python3.7/site-packages/tf_agents/environments/tf_py_environment.py in __init__(self, environment, check_dims, isolation)
87.6s | 104 | 139     if not isinstance(environment, py_environment.PyEnvironment):
87.6s | 105 | 140       raise TypeError(
87.6s | 106 | --> 141           'Environment should implement py_environment.PyEnvironment')
87.6s | 107 | 142
87.6s | 108 | 143     if not environment.batched:
87.6s | 109 |  
87.6s | 110 | TypeError: Environment should implement py_environment.PyEnvironment
87.6s | 111 | In call to configurable 'TFPyEnvironment' (<class 'tf_agents.environments.tf_py_environment.TFPyEnvironment'>)

@pseudo-rnd-thoughts
Copy link
Member

gym.make("CartPole-v0", max_episode_steps=X) is the easiest way of modifying the number of steps used by the TimeLimit wrapper.

@fede72bari
Copy link
Author

fede72bari commented Mar 20, 2023

@pseudo-rnd-thoughts thank you, the error was due the Tensorflow TFPyEnvironment that I do not master; but you and @Kallinteris-Andreas "inspired" me. so giving a look to tf_agents.environments.suite_gym.load I found


tf_agents.environments.suite_gym.load(
    environment_name: Text,
    discount: [tf_agents.typing.types.Float](https://www.tensorflow.org/agents/api_docs/python/tf_agents/typing/types/Float) = 1.0,
    max_episode_steps: Optional[types.Int] = None,
    gym_env_wrappers: Sequence[[tf_agents.typing.types.GymEnvWrapper](https://www.tensorflow.org/agents/api_docs/python/tf_agents/typing/types/GymEnvWrapper)] = (),
    env_wrappers: Sequence[[tf_agents.typing.types.PyEnvWrapper](https://www.tensorflow.org/agents/api_docs/python/tf_agents/typing/types/PyEnvWrapper)] = (),
    spec_dtype_map: Optional[Dict[gym.Space, np.dtype]] = None,
    gym_kwargs: Optional[Dict[str, Any]] = None,
    render_kwargs: Optional[Dict[str, Any]] = None
) -> [tf_agents.environments.PyEnvironment](https://www.tensorflow.org/agents/api_docs/python/tf_agents/environments/PyEnvironment)

so that the instantiation should take the parameter max_episode_steps and pass it to the environment instantiation. So the following code

env = suite_gym.load('CartPole-v1', max_episode_steps=10000)
env = tf_py_environment.TFPyEnvironment(env)

is not giving any error anymore, but running the training code I can see that it is still limited to 500 steps as the maximum. In the training code, I reset the environment when

    time_step = environment.current_time_step()    
    if time_step.is_last():
        time_step = environment.reset()

hope that is last part was correctly coded. So the question: have you ever tried to train a CartPole environment with more than max 500 steps, for instance, 5000? Or less? Has it changed anything?

@pseudo-rnd-thoughts
Copy link
Member

Sorry, I just saw this again.
Cartpole-v0 had the time limit set to 200 which was too easy I believe which is why Cartpole-v1 was added purely changing the time limit to 500.
As for TFAgent, I have no idea why this doesn't work. I suspect you might need to pass the parameter to gym_kwargs as well.
You can test through accessing the TimeLimit wrapper or the environment spec to see what the max episode steps value is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants