Add `reward` and `observation` arguments to env.reset() #451

ChrisCummins · 2021-10-06T10:34:14Z

🚀 Feature

The CompilerEnv constructor accepts a pair of arguments reward_space and observation_space. We should add those to env.reset(), same as specifying the benchmark.

Motivation

Because this feels like a clumsy API:

env.reward_space = "Foobar"
env.reset(benchmark="benchmark://foo-v0/abc")

Pitch

Allow:

env.reset(benchmark="benchmark://foo-v0/abc", reward_space="Foobar")

The text was updated successfully, but these errors were encountered:

uduse · 2022-01-28T02:47:02Z

I prefer env.reset() doesn't take anything at all. If the user wants to change benchmark, just do an extra line of assignment.

Quote Zen of Python:

There should be one-- and preferably only one --obvious way to do it.

It's already unclear to me if these two are strictly equivalent:

env.benchmark = "benchmark://foo-v0/abc"
env.reset()

vs

env.reset(benchmark="benchmark://foo-v0/abc")

Make env.reset() takes nothing but handle potential benchmark changes would yield a simpler and cleaner API. 🤔

ChrisCummins · 2022-02-01T22:50:35Z

Tbh I think that the solution may be to remove the env.benchmark setter entirely and roll everything into reset(). The reason is that using the setter without immediately calling reset() can cause unexpected behavior as nothing actually gets changed until reset() is called:

>>> env.benchmark = "benchmark://foo-v0/abc"
>>> print(env.benchmark)
"benchmark://bar-v0/def"      # wtf!
>>> env.reset()
>>> print(env.benchmark)
"benchmark://foo-v0/abc"

Another problem with the setters is that they make it harder to debug subtle typos in your code:

>>> env.benchmrak = "benchmark://foo-v0/abc"
>>> env.reset()
>>> print(env.benchmark)
"benchmark://bar-v0/def"      # wtf!

whereas a typo in an argument name will raise an error.

I suppose this could be mitigated by having the env.benchmark setter implicitly call reset(), but this might lead to a different kind of surprising behavior.

Same goes for env.observation_space and env.reward_space.

What do you think?

Cheers,
Chris

uduse · 2022-02-02T21:41:46Z

Tbh I think that the solution may be to remove the env.benchmark setter entirely and roll everything into reset(). The reason is that using the setter without immediately calling reset() can cause unexpected behavior as nothing actually gets changed until reset() is called:
>>> env.benchmark = "benchmark://foo-v0/abc"
>>> print(env.benchmark)
"benchmark://bar-v0/def"      # wtf!
>>> env.reset()
>>> print(env.benchmark)
"benchmark://foo-v0/abc"
Another problem with the setters is that they make it harder to debug subtle typos in your code:
>>> env.benchmrak = "benchmark://foo-v0/abc"
>>> env.reset()
>>> print(env.benchmark)
"benchmark://bar-v0/def"      # wtf!
whereas a typo in an argument name will raise an error.

I suppose this could be mitigated by having the env.benchmark setter implicitly call reset(), but this might lead to a different kind of surprising behavior.

Same goes for env.observation_space and env.reward_space.

What do you think?

Cheers, Chris

Yes, I actually agree with removing setters for env.benchmark, env.reward_space, env.action_space, and env.observation_space entirely. This way there are only three ways left to mutate the state (as in memory state of an object) of the environment: env.reset, env.apply, and env.step (I assume env.raw_step is not user API). Seems cleaner to me.

Additionally we want to better differentiate "don't change something when reset" vs "remove something when reset'". e.g. env.reset(..., reward_space=None, ...) could be ambiguous––does the user wants to remove the reward space, or reset but not change the reward space? I would say this means the user wants to remove the reward space. This means the default keyword parameter value should be something other than None. Could be an empty tuple or some special string, I'm not sure.

ChrisCummins · 2022-02-03T10:45:34Z

Yes, I actually agree with removing setters for env.benchmark, env.reward_space, env.action_space, and env.observation_space entirely. This way there are only three ways left to mutate the state (as in memory state of an object) of the environment: env.reset, env.apply, and env.step (I assume env.raw_step is not user API). Seems cleaner to me.

👍

Additionally we want to better differentiate "don't change something when reset" vs "remove something when reset'". e.g. env.reset(..., reward_space=None, ...) could be ambiguous––does the user wants to remove the reward space, or reset but not change the reward space? I would say this means the user wants to remove the reward space. This means the default keyword parameter value should be something other than None. Could be an empty tuple or some special string, I'm not sure.

That's a very good point! I think it may be worth defining a custom type specifically for this purpose to denote "use whatever value is already set". That would make the default values easier to understand than None, and could allow users to be explicit about it, something like:

env.reset(
    benchmark=compiler_gym.ValueNotChanged,
    reward_space="some_new_space",
    observation_space=None,  # no observations
)

Cheers,
Chris

uduse · 2022-02-04T22:24:12Z

env.reset(
    benchmark=compiler_gym.ValueNotChanged,
    reward_space="some_new_space",
    observation_space=None,  # no observations
)

This seems good to me. Alternatively, we could use python's ellipsis ... as this special placeholder, or just the string "same". Personally I prefer just "same" because it's simple, explicit, easy to understand, and immutable which makes it a good choice as the default value.

SoumyajitKarmakar · 2022-03-31T20:14:28Z

Hi @ChrisCummins,
I would like to take a shot at fixing this problem.
I have a question and a trivial solution which might be very silly, but if this

env.reward_space = "Foobar"
env.reset(benchmark="benchmark://foo-v0/abc")

already works, then why not just add a new argument in the function and put the first line in the function body ? Something like,

def reset(..., reward_space: str = "same",...):
  if reward_space != "same":
    self.reward_space = reward_space

and then copy the rest of the function below it, and let it run its normal course of actions.

ChrisCummins · 2022-04-01T17:38:22Z

Hi @SoumyajitKarmakar,

I would like to take a shot at fixing this problem.

Great, thanks!

why not just add a new argument in the function and put the first line in the function body ? Something like,
def reset(..., reward_space: str = "same",...):
  if reward_space != "same":
    self.reward_space = reward_space
and then copy the rest of the function below it, and let it run its normal course of actions.

Yes, your suggestion should work. A few other things to consider though:

The reset() method of all wrappers / subclasses need to be extended to support the new arguments.
The new args need to be documented + tested
The "setter" properties should be marked as deprecated, though can be done in a follow-up PR.

Cheers,
Chris

ChrisCummins · 2022-04-01T17:47:12Z

I'm also not sure I like the default being the string "same", as it doesn't feel particularly self-documenting, and prevents the (unlikely) use of the name "same" with observation/reward spaces.

I would suggest adding a new enum to compiler_gym/util/gym_type_hints.py

from enum import Enum

class OptionalArgumentValue(Enum):
    UNCHANGED = 1

The OptionalArgumentValue.UNCHANGED could be used as the default value and checked for using:

if reward_space != OptionalArgumentValue.UNCHANGED:
    ...

Cheers,
Chris

Modified the env.reset() function to add the reward_space and observation_space parameters. Modified the reset() func of the subclasses. Added a OptionalArgumentValue class in gym_type_hints.py for the default value.

This adds docstrings to cover the new reward_space and observation_space arguments to reset(). Fixes facebookresearch#451.

ChrisCummins added the Enhancement New feature or request label Oct 6, 2021

ChrisCummins added this to the v0.2.1 milestone Oct 6, 2021

ChrisCummins self-assigned this Nov 10, 2021

ChrisCummins modified the milestones: v0.2.1, v0.2.2 Nov 17, 2021

ChrisCummins added good first issue Good for newcomers Help wanted Extra attention is needed labels Dec 15, 2021

ChrisCummins modified the milestones: v0.2.2, v0.2.3 Jan 19, 2022

ChrisCummins modified the milestones: v0.2.3, v0.2.4 Mar 18, 2022

SoumyajitKarmakar mentioned this issue Apr 7, 2022

attempt for #451 #650

Closed

ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this issue Apr 21, 2022

[core] Add docstrings for new reset() args.

b94410f

This adds docstrings to cover the new reward_space and observation_space arguments to reset(). Fixes facebookresearch#451.

ChrisCummins closed this as completed in 1515b43 Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `reward` and `observation` arguments to env.reset() #451

Add `reward` and `observation` arguments to env.reset() #451

ChrisCummins commented Oct 6, 2021

uduse commented Jan 28, 2022

ChrisCummins commented Feb 1, 2022

uduse commented Feb 2, 2022

ChrisCummins commented Feb 3, 2022

uduse commented Feb 4, 2022 •

edited

Loading

SoumyajitKarmakar commented Mar 31, 2022

ChrisCummins commented Apr 1, 2022

ChrisCummins commented Apr 1, 2022

Add reward and observation arguments to env.reset() #451

Add reward and observation arguments to env.reset() #451

Comments

ChrisCummins commented Oct 6, 2021

🚀 Feature

Motivation

Pitch

uduse commented Jan 28, 2022

ChrisCummins commented Feb 1, 2022

uduse commented Feb 2, 2022

ChrisCummins commented Feb 3, 2022

uduse commented Feb 4, 2022 • edited Loading

SoumyajitKarmakar commented Mar 31, 2022

ChrisCummins commented Apr 1, 2022

ChrisCummins commented Apr 1, 2022

Add `reward` and `observation` arguments to env.reset() #451

Add `reward` and `observation` arguments to env.reset() #451

uduse commented Feb 4, 2022 •

edited

Loading