New Step API with terminated, truncated bools instead of done #2752

arjun-kg · 2022-04-14T18:36:59Z

Description

step method is changed to return five items instead of four.

Old API - done=True if episode ends in any way.

New API -
terminated=True if environment terminates (eg. due to task completion, failure etc.)
truncated=True if episode truncates due to a time limit or a reason that is not defined as part of the task MDP

Link to docs - Farama-Foundation/gym-docs#115 (To be updated with latest changes)

Changes

All existing environment implementations are changed to new API without direct support for old API. However gym.make for any environment will default to old API through a compatibility wrapper.
Vector env implementations are changed to new API, with backward compatibility for old API, defaulting to old API. New API can set by a newly added argument new_step_api=True in constructor.
All wrapper implementations are changed to new API, and have backward compatibility and default to old API (can be switched to new API with new_step_api=True).
Some changes in phrasing - terminal_reward, terminal_observation etc. is replaced with final_reward, final_observation etc. The intention is to reserve the 'termination' word for only if terminated=True. (for some motivation, Sutton and Barto uses terminal states to specifically refer to special states whose values are 0, states at the end of the MDP. This is not true for a truncation where the value of the final state need not be 0. So the current usage of terminal_obs etc. would be incorrect if we adopt this definition)
All tests are continued to be performed for old API (since the default is old API for now). A single exception for when the test env is unwrapped and so the compatibility wrapper doesn't apply. Also, special tests are added just for testing new API.
new_step_api argument is used in different places. It's meaning is taken to be "whether this function / class should output step values in new API or not". Eg. self.new_step_api in a wrapper signifies whether the wrapper's step method outputs items in new API (the wrapper itself might have been written in new or old API, but through compatibility code it will output according to self.new_step_api)
play.py alone is retained in old API due to the difficulty in having it be compatible for both APIs simultaneously, and being slightly lower priority.

StepAPICompatibility Wrapper

This wrapper is added to support conversion from old to new API and vice versa.
Takes new_step_api argument in __init__. False (old API) by default.
Wrapper applied at make with new_step_api=False by default. It can be changed during make like gym.make("CartPole-v1", new_step_api=True). The order of wrappers applied at make is as follows - core env -> PassiveEnvChecker -> StepAPICompatibility -> other wrappers

step_api_compatibility function

This function is similar to the wrapper, it is used for backward compatibility in wrappers, vector envs. It is used at interfaces between env / wrapper / vector / outside code. Example usage,

# wrapper's step method
def step(self, action):

    # here self.env.step is made to return in new API, since the wrapper is written in new API
    obs, rew, terminated, truncated, info = step_api_compatibility(self.env.step(action), new_step_api=True) 

    if terminated or truncated:
        print("Episode end")
    ### more wrapper code

    # here the wrapper is made to return in API specified by self.new_step_api, that is set to False by default, and can be changed according to the situation
    return step_api_compatibility((obs, rew, terminated, truncated, info), new_step_api=self.new_step_api)

TimeLimit

In the current implementation of the timelimit wrapper, existence of 'TimeLimit.truncated' key in info means that truncation has occurred. The boolean value it is set to refers to whether the core environment has already ended. So, info['TimeLimit.truncated']=False, means the core environment has already terminated. We can infer terminated=True, truncated=True from this case.
To change old API to new, the compatibility function first checks info. If there is nothing in info, it returns terminated=done and truncated=False as there is no better information available. If TimeLimit info is available, it accordingly sets the two bools.

Backward Compatibility

The PR attempts to achieve almost complete backward compatibility. However, there are cases which haven't been included. Environments directly imported eg. from gym.envs.classic_control import CartPoleEnv would not be backward compatible as these are rewritten in new API. StepAPICompatibility wrapper would need to be used manually in this case. Envs made through gym.make all default to old API. Vector and wrappers also default to old API. These should all continue to work without problems. But due to the scale of the change, bugs are expected.

Warning Details

Warnings are raised at the following locations:

gym.Wrapper constructor - warning raised if self.new_step_api==False. This means any wrapper that does not explicitly pass new_step_api=True into super() will raise the warning since self.new_step_api=False by default. This is taken care of by wrappers written inside gym. Third party wrappers will face a problem in a specific situation - if the wrapper is not impacted by step API. eg. a wrapper subclassing ActionWrapper. This would work without any change for both APIs, however to avoid the warning, they still need to pass new_step_api=True into super(). The thinking is - "If your wrapper supports new step API, you need to pass new_step_api=True to avoid the warning".
PassiveEnvChecker, passive_env_step_check function - if step return has 4 items a warning is raised. This happens only once since this function is only run once after env initialization. Since PassiveEnvChecker is wrapped first before step compatibility in make, this will raise a warning based on the core env implementation's API.
gym.VectorEnv constructor - warning raised if self.new_step_api==False.
StepAPICompatibility wrapper constructor - the wrapper that is applied by default at make. If new_step_api=False, a warning is raised. This is independent of whether the core env is implemented in new or old api and only depends on the new_step_api argument.

Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have run the pre-commit checks with pre-commit run --all-files
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (need to update with latest changes)
My changes generate no new warnings (only intentional warnings)
I have added tests that prove my fix is effective or that my feature works (added two tests but maybe needs more to be comprehensive)
New and existing unit tests pass locally with my changes
Locally runs with atari / pybullet envs

Old API - `done=True` if episode ends in any way. New API - `terminated=True` if environment terminates (eg. due to task completion, failure etc.) `truncated=True` if episode truncates due to a time limit Changes 1. All existing environments are changed to new API without direct support for old API 2. Vector envs are changed to new API without direct support for old API 3. All wrappers (Except TimeLimit, OrderEnforcing) are changed to new API without direct support for old API 4. TimeLimit, OrderEnforcing and wrappers which don't change the step function support both APIs. StepCompatibility Wrapper 1. This wrapper is added to support conversion from old to new API and vice versa. 2. Takes `return_two_dones` argument in __init__. `True` (new API) by default. 3. Wrapper applied at make with `return_two_dones=True` by default. It can be changed during make like `env.make("CartPole-v1", return_two_dones=False)` StepCompatibilityVector Wrapper - Transforms vector environment to old API. Set `return_two_dones=False`. Misc 1. Autoreset bug fixed (hacky) by setting local variable instead of editing self.autoreset in Env.Spec.

gym/core.py

pseudo-rnd-thoughts · 2022-04-17T14:36:10Z

Im thinking about the backward compatibility of this PR, in particular for online tutorials and uni courses, etc
As return_two_dones=True is the default for all environments, will this not break a huge amount of code without a warning being raised as it is the expected value now.
Is it not better to have return_two_dones=False for all environments until 1.0 with the warnings being raised explaining to people to change across and that all wrappers have been modified to expect this now?

arjun-kg · 2022-04-17T15:53:18Z

@pseudo-rnd-thoughts Yes it will break code. But my point to making new API the default is just to make the code a lot simpler, with old API still supported for some (but not all) features. Most wrappers and vector code only support new API. It would get way too convoluted for all that to support both APIs. We'd need to support any code that could've arbitrarily applied any wrapper or vector code.

And for the code breaking, the fix is to apply the compatibility wrappers manually (in places where it cant be toggled with return_two_dones) which is a little better than immediately having to rewrite the broken code but yes, definitely not ideal. Maybe it's worth the complicated code to retain complete compatibility, at this point I'm not sure which way is better and wouldn't mind either way based on a consensus.

gym/wrappers/step_compatibility.py

pseudo-rnd-thoughts · 2022-04-17T16:26:08Z

@arjun-kg So I understand your perspective that we should try to not support one done returns and I agree with the idea.

But this is a massively backward compatibility breaking change that I think we should avoid until gym 1.0
Personally, I think that the default should be one done return with a warning being raised that all wrappers have been modified to expect environments with two done returns. Plus details that this will be changed to two done returns at gym 1.0 plus the compatibility wrappers to change environments from one to two and vice versa.

The additional issue I see with having two done returns as default is that as no warning is raised to alert users to the change then code will start failing with no obvious reason without them looking at the change logs which I suspect a majority of people are not going to do. Therefore, we are going to get a massive increase in issues around this PR.

arjun-kg · 2022-04-17T16:38:13Z

@pseudo-rnd-thoughts Oh that makes sense. We could have old api as default with warning saying wrappers and vector api have been upgraded and will fail unless new api is used, and maybe guide them to a doc to see how to switch api (or use compatibility wrapper) for whatever is not available in the old api. So yes like you said, it won't be an error without warning.

pseudo-rnd-thoughts · 2022-04-17T16:46:11Z

Thanks, could you add a test that checks that without return_two_dones=True then the warning is raised.

…_trunc_v3

arjun-kg · 2022-04-20T19:17:44Z

@pseudo-rnd-thoughts I've made the changes. All the tests assume new api, so argument return_two_dones=True is passed for all env making in tests.

pseudo-rnd-thoughts · 2022-04-20T20:00:32Z

@arjun-kg Thanks for doing that, the workflows haven't happened but I imagine that we are going to get a massive number of deprecation warnings.
If true, would you these be able to silence these warnings in the CI, you can pass a parameter to pytest to ignore certain warnings or add a @pytest.mark.silencewarnings (I believe) to all of the test functions. The first opinion is probably best.

…_trunc_v3

RedTachyon

My single biggest concern is the approach taken to backwards compatibility, which makes a lot of the code needlessly complex. I'm of the moderately strong opinion that all the code inside of gym after this PR is merged, should explicitly only support the new step semantic.

Since we definitely want environments written with obs, reward, done, info to still work, we can have a wrapper doing something like

class OldToNew:
    ...
    def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]:
        outputs = self.env.step(action)
        if len(outputs) == 4:
            obs, reward, done, info = outputs
            return obs, reward, done, False, info
        elif len(outputs) == 5:
            # We applied the wrapper needlessly
            return outputs
        else:
            raise ValueError("You messed something up real hard")
            
            
class NewToOld:
    ...
    def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]:
        outputs = self.env.step(action)
        if len(outputs) == 4:
            # We applied the wrapper needlessly
            return outputs
        elif len(outputs) == 5:
            obs, reward, terminated, truncated, info = outputs
            return obs, reward, terminated or truncated, info
        else:
            raise ValueError("You messed something up real hard")

Then we apply this as appropriate, giving the users some control over what will be applied, and all the code should be compatible both ways as long as environments weren't trying to do their own truncations and just used TimeLimitWrapper instead.

A bunch of individual code comments are about this exact issue before I realized that's like the only big thing I have to comment about here.

RedTachyon · 2022-04-24T17:20:08Z

gym/core.py

+    def step(
+        self, action: ActType
+    ) -> Union[
+        Tuple[ObsType, float, bool, bool, dict], Tuple[ObsType, float, bool, dict]


I don't know if I like this approach to backwards compatibility. If this is the official state of (for example) 0.24.0, then you can't reliably write an algorithm that will work for all valid 0.24.0 environments. I think we should just say that an environment should have the signature of (ObsType, float, bool, bool, dict), and then provide a wrapper-like compatibility layer that can convert an old-style environment to a new-style environment.

RedTachyon · 2022-04-24T17:20:50Z

gym/core.py

@@ -76,13 +81,17 @@ def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
        Returns:
            observation (object): agent's observation of the current environment. This will be an element of the environment's :attr:`observation_space`. This may, for instance, be a numpy array containing the positions and velocities of certain objects.
            reward (float) : amount of reward returned after previous action
-            done (bool): whether the episode has ended, in which case further :meth:`step` calls will return undefined results. A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully, a certain timelimit was exceeded, or the physics simulation has entered an invalid state. ``info`` may contain additional information regarding the reason for a ``done`` signal.
+            terminated (bool): whether the episode has ended due to a termination, in which case further step() calls will return undefined results


I would rephrase "termination" to something like "reaching a terminal state", or otherwise to indicate that it's about the intrinsic properties of the environment

RedTachyon · 2022-04-24T17:34:27Z

gym/envs/registration.py

@@ -96,6 +96,7 @@ class EnvSpec:
    max_episode_steps: Optional[int] = field(default=None)
    order_enforce: bool = field(default=True)
    autoreset: bool = field(default=False)
+    return_two_dones: bool = field(default=False)


I don't think this will actually be useful. It only really has any value if people are explicitly setting it, which is a new (and very temporary) feature, and we explicitly deprecate one of the settings the moment it's introduced. This goes back to the general comment that I think the library should only support one step semantic, and compatibilities can be done through a wrapper-like thing

Do you mean a wrapper that's always applied at make is preferable to an argument? Or no wrapper at make? My point was to have this to allow backward compatibility and, if users so chose to use the new api, they could just switch it to True at make.

I think I just find it confusing as to what this is supposed to do. Does this indicate what the environment intrinsically returns? Does it decide whether a wrapper should be applied?

I'd prefer a wrapper that's always applied, unless explicitly turned off, which defaults to a no-op if the environment already follows the new API

RedTachyon · 2022-04-24T17:36:19Z

gym/wrappers/step_compatibility.py

+from gym import logger
+
+
+class StepCompatibility(gym.Wrapper):


This desperately needs some comments and docstrings

RedTachyon · 2022-04-24T17:37:06Z

gym/wrappers/step_compatibility.py

+        self._return_two_dones = return_two_dones
+        if not self._return_two_dones:
+            logger.deprecation(
+                "[StepAPI] Initializing environment in old step API which returns one bool instead of two. "


The [StepAPI] thing implies there's a syntax like that in gym overall, which is not the case as far as I know, so I think it can be removed

pseudo-rnd-thoughts · 2022-04-24T17:54:43Z

@RedTachyon I believe the StepCompatibility wrapper in gym/wrappers/step_compatibility and gym/vector/step_compatibility are exactly what you describe as OldtoNew and NewtoOld
Currently, all environment created using .make use the backward compatibility wrapper with one done but raise a warning to users about the two done API and the changes made.
To use the two done API, user just add the return_two_done=True as a kwargs for .make

RedTachyon · 2022-04-24T18:00:49Z

Good point. The thing is that all the internal code supports both old and new APIs, and I think we should stick to just one, and old-style environments will be "projected" into the new API and processed as such. I don't think there should be code in gym (outside of the explicit compatibility layer, and maybe some tests) which checks if len(outputs) == 4: ... else: ...

arjun-kg · 2022-04-24T18:53:42Z

I don't think there should be code in gym (outside of the explicit compatibility layer, and maybe some tests) which checks if len(outputs) == 4: ... else: ...

Right now, only core and TimeLimit has this kind of check. The only other place where there are details about compatibility is in registration and vector init, applying the compatibility wrapper, and giving the user an argument to toggle between new and old api at make. Everything else is switched cleanly to the new API.

RedTachyon · 2022-04-24T22:11:30Z

I'd remove it from core at least in that case. Probably also from TimeLimit, and then the compatibility wrapper would be applied as the very first wrapper, so that TimeLimit only has to ever operate on new-API envs

RedTachyon · 2022-04-24T22:14:45Z

Or can you maybe explain what is the exact workflow in these situations? The name return_two_renders is very unclear to me.

I'm calling gym.make("AntBulletEnv-v0") on an env written with the old API, and want it to follow the new API
I'm calling the same thing, want it to follow the old API (is this possible?)
I'm creating a brand well-written new env, I want it to follow the new API
I'm creating the same thing, I want it to follow the old API (is this possible?)

arjun-kg · 2022-04-25T04:47:58Z

@RedTachyon
So the compatibility wrapper now is always applied and switches to old api no matter what the style of the core env. return_two_dones essentially means whether the wrapped env is in new API or not (irrespective of the input env). The compatibility wrapper is always applied at make. and return_two_dones=False by default so no matter what env is sent in, it switches to old API by default unless return_two_dones=True is specified. For your questions, all four scenarios are possible -

gym.make("AntBulletEnv-v0", return_two_dones=True) - the old api is explicitly converted to new api through the wrapper
gym.make("AntBulletEnv-v0") - the wrapper is present but passes through without changes. This also ensures old api envs are backward compatible.
gym.make("BrandNewAPIEnv-v0", return_two_dones=True) - the wrapper is present but passes through without changes
gym.make("BrandNewAPIEnv-v0") - the new api is explicitly converted to old api through the wrapper. This ensures all the envs in gym that are in new api (eg. CartPole) are backward compatible. Codes already using CartPole will switch to the old API with a warning. Ofc the minor inconvenience is that, for people who completely switch to new API, they will still have to send the return_two_dones=True argument till 1.0.

The reason I had TimeLimit be compatible with both APIs was for it to work smoothly in all the above scenarios. The above envs would first be wrapped by StepCompatibility wrapper, then wrapped by order enforcing and then timelimit and work smoothly irrespective of how the environments were transformed.

Would it be clearer if the name of the argument was new_step_api instead of return_two_dones?

…_trunc_v3

RedTachyon · 2022-07-04T16:35:14Z

From what I'm seeing now, the deprecation warning pops up even if I create an env with the new API, it seems to be due to some wrapper issue
https://colab.research.google.com/drive/16TdzbKkGCRusKGSNSTUrnpRmqWsFoKBn?usp=sharing

…_trunc_v3

RedTachyon · 2022-07-06T23:00:21Z

Just two questions because I think I misunderstood one thing during a call. If there's a third-party wrapper written in the old API (i.e. no mentions of new_step_api whatsoever), and it is applied to an old-API environment, everything will work alright? (except for a warning)

If a third-party wrapper is written in old API, can we do:

env = gym.make(something, new_step_api=True)
env = OldStyleWrapper(env)

? Is there an automatic way to make it work? Or does it require an update to the wrapper?

If the answer to the first question is "yes", then after the merge conflicts are resolved, I think this can be merged. I still have... many problems, but they all seem to be necessary for now, so let's hope we can rip it out soon-ish for 1.0

(the second question doesn't affect the approval, it's just for me to understand where we're standing because there will be issues raised about it)

arjun-kg · 2022-07-09T04:16:15Z

@RedTachyon

If there's a third-party wrapper written in the old API (i.e. no mentions of new_step_api whatsoever), and it is applied to an old-API environment, everything will work alright?

Yes. This was one of the reasons behind introducing new_step_api attribute in the base wrapper and having it default to False, to have this kind of backward compatibility.

? Is there an automatic way to make it work? Or does it require an update to the wrapper?

No, this will not work (unless the wrapper doesn't need to unpack step returns). There's no 'automatic way' to do this. But the change to the wrapper is not too bad to get it to work:

class OldStyleWrapper:
   ...
   def step(self, action):
        o, r, d, i = step_api_compatibility(self.env.step(action), new_step_api=False)
        ...
        return o, r, d, i

…_trunc_v3

pseudo-rnd-thoughts · 2022-07-09T20:41:13Z

LGTM

openai#2752 incorrectly removes the obligated positional parameter. This bug fix corrected it.

I came across this bug in gym: openai/gym#2752 When time limit is triggered, the output is a bool instead of a dict like {'player_0': False, 'player_1': False} This produces the following error randomly. It appears sooner or later during the training. ~/.conda/envs/lux_ai_s2/lib/python3.7/site-packages/gym/wrappers/time_limit.py in step(self, action) 16 self._elapsed_steps is not None 17 ), "Cannot call env.step() before calling reset()" ---> 18 observation, reward, done, info = self.env.step(action) 19 self._elapsed_steps += 1 20 if self._elapsed_steps >= self._max_episode_steps: /tmp/ipykernel_13436/3060603165.py in step(self, action) 57 obs, _, done, info = self.env.step(action) 58 obs = obs[agent] ---> 59 done = done[agent] 60 # if type(done) == type({}): done = done[agent] 61 # elif type(done) == type(True): done = {agent: done, opp_agent: False} TypeError: 'bool' object is not subscriptable I sugest this patch to avoid that.

arjun-kg mentioned this pull request Apr 14, 2022

New step API with terminated, truncated bools instead of done Farama-Foundation/gym-docs#115

Closed

XuehaiPan reviewed Apr 15, 2022

View reviewed changes

gym/core.py Show resolved Hide resolved

XuehaiPan reviewed Apr 15, 2022

View reviewed changes

gym/core.py Show resolved Hide resolved

araffin mentioned this pull request Apr 15, 2022

[Proposal] Formal API handling of truncation vs termination #2510

Closed

pseudo-rnd-thoughts mentioned this pull request Apr 17, 2022

Use mujoco bindings instead of mujoco_py #2595

Closed

XuehaiPan reviewed Apr 17, 2022

View reviewed changes

gym/wrappers/step_compatibility.py Outdated Show resolved Hide resolved

arjun-kg added 3 commits April 20, 2022 20:54

Merge branch 'master' of https://github.com/openai/gym into done_term…

6618da5

…_trunc_v3

Setting return_two_dones=False as default

a0c4475

update warnings

2aabc30

arjun-kg added 4 commits April 21, 2022 15:36

pytest - ignore deprecation warnings

1babe4e

Only ignore step api deprecation warnings

c9c6add

fix duplicate wrapping bug in vector envs

c5fe53c

Merge branch 'master' of https://github.com/openai/gym into done_term…

f88927d

…_trunc_v3

RedTachyon reviewed Apr 24, 2022

View reviewed changes

arjun-kg added 2 commits April 25, 2022 11:59

Merge branch 'master' of https://github.com/openai/gym into done_term…

7c1e9c7

…_trunc_v3

edit docstrings, comments, warnings

6af7182

arjun-kg added 4 commits July 3, 2022 14:47

Merge branch 'master' of https://github.com/openai/gym into done_term…

bffa257

…_trunc_v3

update tests

d7dff2c

fix pattern

b2c10a4

restructure warnings

6553bed

arjun-kg added 5 commits July 4, 2022 22:26

fix incorrect warning

50d367e

fix incorrect warnings (properly)

d71836f

Merge branch 'master' of https://github.com/openai/gym into done_term…

78a507e

…_trunc_v3

add warning to env checker

a747625

Merge branch 'master' of https://github.com/openai/gym into done_term…

28c7b36

…_trunc_v3

Merge branch 'master' of https://github.com/openai/gym into done_term…

d65d21b

…_trunc_v3

jkterry1 merged commit 907b1b2 into openai:master Jul 9, 2022

Trinkle23897 mentioned this pull request Jul 15, 2022

[BUG] TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases sail-sg/envpool#166

Closed

ZhiqingXiao added a commit to ZhiqingXiao/gym that referenced this pull request Jul 17, 2022

Bug fix: revert an incorrect edition

2547c2e

openai#2752 incorrectly removes the obligated positional parameter. This bug fix corrected it.

ZhiqingXiao mentioned this pull request Jul 17, 2022

Bug fix: Revert an incorrect edition on wrapper.FrameStack #2973

Merged

6 tasks

pseudo-rnd-thoughts mentioned this pull request Aug 1, 2022

[Blog post] New Step API pseudo-rnd-thoughts/gym#1

Open

arjun-kg mentioned this pull request Aug 5, 2022

Support only new step API (while retaining compatibility functions) #3019

Merged

12 tasks

br0kej mentioned this pull request Aug 21, 2022

Add support for the new OpenAI Gym step API interface dstl/YAWNING-TITAN#4

Open

tlpss mentioned this pull request Sep 7, 2022

[Feature Request] Support for Truncated Gym API from Gym>= 0.25 DLR-RM/stable-baselines3#1053

Closed

araffin mentioned this pull request Nov 2, 2022

Re-add timelimit truncation information Farama-Foundation/Gymnasium#101

Closed

10 tasks

axb2035 mentioned this pull request Nov 16, 2022

[Question] Why does env.step in the HalfCheetah environment return 5 values if one of them if always False? Farama-Foundation/Gymnasium#134

Closed

This was referenced Feb 27, 2023

Patch for gym version 0.21.0 bug jxtrbtk/Lux-Design-S2#1

Merged

error in train.py - bool object is not subscriptable Lux-AI-Challenge/Lux-Design-S2#223

Closed

Patch for gym version 0.21.0 bug Lux-AI-Challenge/Lux-Design-S2#224

Closed

XuehaiPan mentioned this pull request Mar 2, 2023

[BUG] Incorrect reset handling in collectors pytorch/rl#937

Closed

3 tasks

pimpale mentioned this pull request May 29, 2023

Move from Gym 0.22 to Gymnasium 0.28 metadriverse/metadrive#445

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Step API with terminated, truncated bools instead of done #2752

New Step API with terminated, truncated bools instead of done #2752

arjun-kg commented Apr 14, 2022 •

edited

Loading

pseudo-rnd-thoughts commented Apr 17, 2022

arjun-kg commented Apr 17, 2022 •

edited

Loading

pseudo-rnd-thoughts commented Apr 17, 2022

arjun-kg commented Apr 17, 2022

pseudo-rnd-thoughts commented Apr 17, 2022

arjun-kg commented Apr 20, 2022

pseudo-rnd-thoughts commented Apr 20, 2022

RedTachyon left a comment

RedTachyon Apr 24, 2022

RedTachyon Apr 24, 2022

RedTachyon Apr 24, 2022

arjun-kg Apr 24, 2022

RedTachyon Apr 24, 2022

RedTachyon Apr 24, 2022

RedTachyon Apr 24, 2022

pseudo-rnd-thoughts commented Apr 24, 2022

RedTachyon commented Apr 24, 2022 •

edited

Loading

arjun-kg commented Apr 24, 2022

RedTachyon commented Apr 24, 2022

RedTachyon commented Apr 24, 2022

arjun-kg commented Apr 25, 2022 •

edited

Loading

RedTachyon commented Jul 4, 2022

RedTachyon commented Jul 6, 2022 •

edited

Loading

arjun-kg commented Jul 9, 2022

pseudo-rnd-thoughts commented Jul 9, 2022

New Step API with terminated, truncated bools instead of done #2752

New Step API with terminated, truncated bools instead of done #2752

Conversation

arjun-kg commented Apr 14, 2022 • edited Loading

Description

Changes

StepAPICompatibility Wrapper

step_api_compatibility function

TimeLimit

Backward Compatibility

Warning Details

Checklist:

pseudo-rnd-thoughts commented Apr 17, 2022

arjun-kg commented Apr 17, 2022 • edited Loading

pseudo-rnd-thoughts commented Apr 17, 2022

arjun-kg commented Apr 17, 2022

pseudo-rnd-thoughts commented Apr 17, 2022

arjun-kg commented Apr 20, 2022

pseudo-rnd-thoughts commented Apr 20, 2022

RedTachyon left a comment

Choose a reason for hiding this comment

RedTachyon Apr 24, 2022

Choose a reason for hiding this comment

RedTachyon Apr 24, 2022

Choose a reason for hiding this comment

RedTachyon Apr 24, 2022

Choose a reason for hiding this comment

arjun-kg Apr 24, 2022

Choose a reason for hiding this comment

RedTachyon Apr 24, 2022

Choose a reason for hiding this comment

RedTachyon Apr 24, 2022

Choose a reason for hiding this comment

RedTachyon Apr 24, 2022

Choose a reason for hiding this comment

pseudo-rnd-thoughts commented Apr 24, 2022

RedTachyon commented Apr 24, 2022 • edited Loading

arjun-kg commented Apr 24, 2022

RedTachyon commented Apr 24, 2022

RedTachyon commented Apr 24, 2022

arjun-kg commented Apr 25, 2022 • edited Loading

RedTachyon commented Jul 4, 2022

RedTachyon commented Jul 6, 2022 • edited Loading

arjun-kg commented Jul 9, 2022

pseudo-rnd-thoughts commented Jul 9, 2022

arjun-kg commented Apr 14, 2022 •

edited

Loading

arjun-kg commented Apr 17, 2022 •

edited

Loading

RedTachyon commented Apr 24, 2022 •

edited

Loading

arjun-kg commented Apr 25, 2022 •

edited

Loading

RedTachyon commented Jul 6, 2022 •

edited

Loading