Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reward and observation arguments to env.reset() #451

Closed
ChrisCummins opened this issue Oct 6, 2021 · 8 comments
Closed

Add reward and observation arguments to env.reset() #451

ChrisCummins opened this issue Oct 6, 2021 · 8 comments
Assignees
Labels
Enhancement New feature or request good first issue Good for newcomers Help wanted Extra attention is needed
Milestone

Comments

@ChrisCummins
Copy link
Contributor

🚀 Feature

The CompilerEnv constructor accepts a pair of arguments reward_space and observation_space. We should add those to env.reset(), same as specifying the benchmark.

Motivation

Because this feels like a clumsy API:

env.reward_space = "Foobar"
env.reset(benchmark="benchmark://foo-v0/abc")

Pitch

Allow:

env.reset(benchmark="benchmark://foo-v0/abc", reward_space="Foobar")
@ChrisCummins ChrisCummins added the Enhancement New feature or request label Oct 6, 2021
@ChrisCummins ChrisCummins added this to the v0.2.1 milestone Oct 6, 2021
@ChrisCummins ChrisCummins self-assigned this Nov 10, 2021
@ChrisCummins ChrisCummins modified the milestones: v0.2.1, v0.2.2 Nov 17, 2021
@ChrisCummins ChrisCummins added good first issue Good for newcomers Help wanted Extra attention is needed labels Dec 15, 2021
@ChrisCummins ChrisCummins modified the milestones: v0.2.2, v0.2.3 Jan 19, 2022
@uduse
Copy link
Contributor

uduse commented Jan 28, 2022

I prefer env.reset() doesn't take anything at all. If the user wants to change benchmark, just do an extra line of assignment.

Quote Zen of Python:

There should be one-- and preferably only one --obvious way to do it.

It's already unclear to me if these two are strictly equivalent:

env.benchmark = "benchmark://foo-v0/abc"
env.reset()

vs

env.reset(benchmark="benchmark://foo-v0/abc")

Make env.reset() takes nothing but handle potential benchmark changes would yield a simpler and cleaner API. 🤔

@ChrisCummins
Copy link
Contributor Author

Tbh I think that the solution may be to remove the env.benchmark setter entirely and roll everything into reset(). The reason is that using the setter without immediately calling reset() can cause unexpected behavior as nothing actually gets changed until reset() is called:

>>> env.benchmark = "benchmark://foo-v0/abc"
>>> print(env.benchmark)
"benchmark://bar-v0/def"      # wtf!
>>> env.reset()
>>> print(env.benchmark)
"benchmark://foo-v0/abc"

Another problem with the setters is that they make it harder to debug subtle typos in your code:

>>> env.benchmrak = "benchmark://foo-v0/abc"
>>> env.reset()
>>> print(env.benchmark)
"benchmark://bar-v0/def"      # wtf!

whereas a typo in an argument name will raise an error.

I suppose this could be mitigated by having the env.benchmark setter implicitly call reset(), but this might lead to a different kind of surprising behavior.

Same goes for env.observation_space and env.reward_space.

What do you think?

Cheers,
Chris

@uduse
Copy link
Contributor

uduse commented Feb 2, 2022

Tbh I think that the solution may be to remove the env.benchmark setter entirely and roll everything into reset(). The reason is that using the setter without immediately calling reset() can cause unexpected behavior as nothing actually gets changed until reset() is called:

>>> env.benchmark = "benchmark://foo-v0/abc"
>>> print(env.benchmark)
"benchmark://bar-v0/def"      # wtf!
>>> env.reset()
>>> print(env.benchmark)
"benchmark://foo-v0/abc"

Another problem with the setters is that they make it harder to debug subtle typos in your code:

>>> env.benchmrak = "benchmark://foo-v0/abc"
>>> env.reset()
>>> print(env.benchmark)
"benchmark://bar-v0/def"      # wtf!

whereas a typo in an argument name will raise an error.

I suppose this could be mitigated by having the env.benchmark setter implicitly call reset(), but this might lead to a different kind of surprising behavior.

Same goes for env.observation_space and env.reward_space.

What do you think?

Cheers, Chris

Yes, I actually agree with removing setters for env.benchmark, env.reward_space, env.action_space, and env.observation_space entirely. This way there are only three ways left to mutate the state (as in memory state of an object) of the environment: env.reset, env.apply, and env.step (I assume env.raw_step is not user API). Seems cleaner to me.

Additionally we want to better differentiate "don't change something when reset" vs "remove something when reset'". e.g. env.reset(..., reward_space=None, ...) could be ambiguous––does the user wants to remove the reward space, or reset but not change the reward space? I would say this means the user wants to remove the reward space. This means the default keyword parameter value should be something other than None. Could be an empty tuple or some special string, I'm not sure.

@ChrisCummins
Copy link
Contributor Author

Yes, I actually agree with removing setters for env.benchmark, env.reward_space, env.action_space, and env.observation_space entirely. This way there are only three ways left to mutate the state (as in memory state of an object) of the environment: env.reset, env.apply, and env.step (I assume env.raw_step is not user API). Seems cleaner to me.

👍

Additionally we want to better differentiate "don't change something when reset" vs "remove something when reset'". e.g. env.reset(..., reward_space=None, ...) could be ambiguous––does the user wants to remove the reward space, or reset but not change the reward space? I would say this means the user wants to remove the reward space. This means the default keyword parameter value should be something other than None. Could be an empty tuple or some special string, I'm not sure.

That's a very good point! I think it may be worth defining a custom type specifically for this purpose to denote "use whatever value is already set". That would make the default values easier to understand than None, and could allow users to be explicit about it, something like:

env.reset(
    benchmark=compiler_gym.ValueNotChanged,
    reward_space="some_new_space",
    observation_space=None,  # no observations
)

Cheers,
Chris

@uduse
Copy link
Contributor

uduse commented Feb 4, 2022

env.reset(
    benchmark=compiler_gym.ValueNotChanged,
    reward_space="some_new_space",
    observation_space=None,  # no observations
)

This seems good to me. Alternatively, we could use python's ellipsis ... as this special placeholder, or just the string "same". Personally I prefer just "same" because it's simple, explicit, easy to understand, and immutable which makes it a good choice as the default value.

@ChrisCummins ChrisCummins modified the milestones: v0.2.3, v0.2.4 Mar 18, 2022
@SoumyajitKarmakar
Copy link
Contributor

Hi @ChrisCummins,
I would like to take a shot at fixing this problem.
I have a question and a trivial solution which might be very silly, but if this

env.reward_space = "Foobar"
env.reset(benchmark="benchmark://foo-v0/abc")

already works, then why not just add a new argument in the function and put the first line in the function body ? Something like,

def reset(..., reward_space: str = "same",...):
  if reward_space != "same":
    self.reward_space = reward_space

and then copy the rest of the function below it, and let it run its normal course of actions.

@ChrisCummins
Copy link
Contributor Author

Hi @SoumyajitKarmakar,

I would like to take a shot at fixing this problem.

Great, thanks!

why not just add a new argument in the function and put the first line in the function body ? Something like,

def reset(..., reward_space: str = "same",...):
  if reward_space != "same":
    self.reward_space = reward_space

and then copy the rest of the function below it, and let it run its normal course of actions.

Yes, your suggestion should work. A few other things to consider though:

  1. The reset() method of all wrappers / subclasses need to be extended to support the new arguments.
  2. The new args need to be documented + tested
  3. The "setter" properties should be marked as deprecated, though can be done in a follow-up PR.

Cheers,
Chris

@ChrisCummins
Copy link
Contributor Author

I'm also not sure I like the default being the string "same", as it doesn't feel particularly self-documenting, and prevents the (unlikely) use of the name "same" with observation/reward spaces.

I would suggest adding a new enum to compiler_gym/util/gym_type_hints.py

from enum import Enum

class OptionalArgumentValue(Enum):
    UNCHANGED = 1

The OptionalArgumentValue.UNCHANGED could be used as the default value and checked for using:

if reward_space != OptionalArgumentValue.UNCHANGED:
    ...

Cheers,
Chris

SoumyajitKarmakar added a commit to SoumyajitKarmakar/CompilerGym that referenced this issue Apr 2, 2022
Modified the env.reset() function to add the reward_space and observation_space parameters. Modified the reset() func of the subclasses. Added a OptionalArgumentValue class in gym_type_hints.py for the default value.
ChrisCummins pushed a commit to ChrisCummins/CompilerGym that referenced this issue Apr 20, 2022
Modified the env.reset() function to add the reward_space and observation_space parameters. Modified the reset() func of the subclasses. Added a OptionalArgumentValue class in gym_type_hints.py for the default value.
ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this issue Apr 21, 2022
This adds docstrings to cover the new reward_space and
observation_space arguments to reset().

Fixes facebookresearch#451.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request good first issue Good for newcomers Help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants