How can be modified the max steps parameter in CartPole env? #397

fede72bari · 2023-03-20T10:32:38Z

Question

I need to extend the max steps parameter of the CartPole environment. I looked around and found some proposals for Gym rather than Gymnasium such as something similar to this:

env = gym.make("CartPole-v0")
env._max_episode_steps = 500

found here openai/gym#463

but in my case seems to not work:

env = suite_gym.load('CartPole-v1')
env = tf_py_environment.TFPyEnvironment(env)
env._max_episode_steps = 10000

Is there any other way? Thank you.

The text was updated successfully, but these errors were encountered:

Kallinteris-Andreas · 2023-03-20T10:59:56Z

Time limit wrapper
https://gymnasium.farama.org/api/wrappers/misc_wrappers/#gymnasium.wrappers.TimeLimit

fede72bari · 2023-03-20T11:20:54Z

thank you @Kallinteris-Andreas

I tried to run your suggested code as following


env = suite_gym.load('CartPole-v1')
env = TimeLimit(env, max_episode_steps=10000)
env = tf_py_environment.TFPyEnvironment(env)

but this arises the following error that is beyond my comprehension skills, but probably conflicts with the following TensorFlow instruction tf_py_environment.TFPyEnvironment


87.6s | 71 | ---------------------------------------------------------------------------
-- | -- | --
87.6s | 72 | Exception encountered at "In [5]":
87.6s | 73 | ---------------------------------------------------------------------------
87.6s | 74 | TypeError                                 Traceback (most recent call last)
87.6s | 75 | /tmp/ipykernel_19/134893757.py in <module>
87.6s | 76 | 1 env = suite_gym.load('CartPole-v1')
87.6s | 77 | 2 env = TimeLimit(env, max_episode_steps=10000)
87.6s | 78 | ----> 3 env = tf_py_environment.TFPyEnvironment(env)
87.6s | 79 | 4
87.6s | 80 | 5
87.6s | 81 |  
87.6s | 82 | /opt/conda/lib/python3.7/site-packages/gin/config.py in gin_wrapper(*args, **kwargs)
87.6s | 83 | 1603       scope_info = " in scope '{}'".format(scope_str) if scope_str else ''
87.6s | 84 | 1604       err_str = err_str.format(name, fn_or_cls, scope_info)
87.6s | 85 | -> 1605       utils.augment_exception_message_and_reraise(e, err_str)
87.6s | 86 | 1606
87.6s | 87 | 1607   return gin_wrapper
87.6s | 88 |  
87.6s | 89 | /opt/conda/lib/python3.7/site-packages/gin/utils.py in augment_exception_message_and_reraise(exception, message)
87.6s | 90 | 39   proxy = ExceptionProxy()
87.6s | 91 | 40   ExceptionProxy.__qualname__ = type(exception).__qualname__
87.6s | 92 | ---> 41   raise proxy.with_traceback(exception.__traceback__) from None
87.6s | 93 | 42
87.6s | 94 | 43
87.6s | 95 |  
87.6s | 96 | /opt/conda/lib/python3.7/site-packages/gin/config.py in gin_wrapper(*args, **kwargs)
87.6s | 97 | 1580
87.6s | 98 | 1581     try:
87.6s | 99 | -> 1582       return fn(*new_args, **new_kwargs)
87.6s | 100 | 1583     except Exception as e:  # pylint: disable=broad-except
87.6s | 101 | 1584       err_str = ''
87.6s | 102 |  
87.6s | 103 | /opt/conda/lib/python3.7/site-packages/tf_agents/environments/tf_py_environment.py in __init__(self, environment, check_dims, isolation)
87.6s | 104 | 139     if not isinstance(environment, py_environment.PyEnvironment):
87.6s | 105 | 140       raise TypeError(
87.6s | 106 | --> 141           'Environment should implement py_environment.PyEnvironment')
87.6s | 107 | 142
87.6s | 108 | 143     if not environment.batched:
87.6s | 109 |  
87.6s | 110 | TypeError: Environment should implement py_environment.PyEnvironment
87.6s | 111 | In call to configurable 'TFPyEnvironment' (<class 'tf_agents.environments.tf_py_environment.TFPyEnvironment'>)

pseudo-rnd-thoughts · 2023-03-20T11:41:35Z

gym.make("CartPole-v0", max_episode_steps=X) is the easiest way of modifying the number of steps used by the TimeLimit wrapper.

fede72bari · 2023-03-20T16:27:37Z

@pseudo-rnd-thoughts thank you, the error was due the Tensorflow TFPyEnvironment that I do not master; but you and @Kallinteris-Andreas "inspired" me. so giving a look to tf_agents.environments.suite_gym.load I found


tf_agents.environments.suite_gym.load(
    environment_name: Text,
    discount: [tf_agents.typing.types.Float](https://www.tensorflow.org/agents/api_docs/python/tf_agents/typing/types/Float) = 1.0,
    max_episode_steps: Optional[types.Int] = None,
    gym_env_wrappers: Sequence[[tf_agents.typing.types.GymEnvWrapper](https://www.tensorflow.org/agents/api_docs/python/tf_agents/typing/types/GymEnvWrapper)] = (),
    env_wrappers: Sequence[[tf_agents.typing.types.PyEnvWrapper](https://www.tensorflow.org/agents/api_docs/python/tf_agents/typing/types/PyEnvWrapper)] = (),
    spec_dtype_map: Optional[Dict[gym.Space, np.dtype]] = None,
    gym_kwargs: Optional[Dict[str, Any]] = None,
    render_kwargs: Optional[Dict[str, Any]] = None
) -> [tf_agents.environments.PyEnvironment](https://www.tensorflow.org/agents/api_docs/python/tf_agents/environments/PyEnvironment)

so that the instantiation should take the parameter max_episode_steps and pass it to the environment instantiation. So the following code

env = suite_gym.load('CartPole-v1', max_episode_steps=10000)
env = tf_py_environment.TFPyEnvironment(env)

is not giving any error anymore, but running the training code I can see that it is still limited to 500 steps as the maximum. In the training code, I reset the environment when

    time_step = environment.current_time_step()    
    if time_step.is_last():
        time_step = environment.reset()

hope that is last part was correctly coded. So the question: have you ever tried to train a CartPole environment with more than max 500 steps, for instance, 5000? Or less? Has it changed anything?

pseudo-rnd-thoughts · 2023-05-02T11:12:47Z

Sorry, I just saw this again.
Cartpole-v0 had the time limit set to 200 which was too easy I believe which is why Cartpole-v1 was added purely changing the time limit to 500.
As for TFAgent, I have no idea why this doesn't work. I suspect you might need to pass the parameter to gym_kwargs as well.
You can test through accessing the TimeLimit wrapper or the environment spec to see what the max episode steps value is

fede72bari added the question Further information is requested label Mar 20, 2023

pseudo-rnd-thoughts closed this as completed Mar 20, 2023

pseudo-rnd-thoughts reopened this Mar 20, 2023

pseudo-rnd-thoughts closed this as completed Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can be modified the max steps parameter in CartPole env? #397

How can be modified the max steps parameter in CartPole env? #397

fede72bari commented Mar 20, 2023

Kallinteris-Andreas commented Mar 20, 2023

fede72bari commented Mar 20, 2023 •

edited

Loading

pseudo-rnd-thoughts commented Mar 20, 2023

fede72bari commented Mar 20, 2023 •

edited

Loading

pseudo-rnd-thoughts commented May 2, 2023

How can be modified the max steps parameter in CartPole env? #397

How can be modified the max steps parameter in CartPole env? #397

Comments

fede72bari commented Mar 20, 2023

Question

Kallinteris-Andreas commented Mar 20, 2023

fede72bari commented Mar 20, 2023 • edited Loading

pseudo-rnd-thoughts commented Mar 20, 2023

fede72bari commented Mar 20, 2023 • edited Loading

pseudo-rnd-thoughts commented May 2, 2023

fede72bari commented Mar 20, 2023 •

edited

Loading

fede72bari commented Mar 20, 2023 •

edited

Loading