-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split CompilerEnv.step() into two methods for singular or lists of actions #611
Split CompilerEnv.step() into two methods for singular or lists of actions #611
Conversation
CompilerEnv.step() currently accepts two types for the "action" argument: a scalar action, or an iterable of actions. This kind of type overloading does not work for list types. This adds a new method, CompilerEnv.multistep(), that explicitly takes takes an iterable sequence of actions. If you want to run multiple actions in a single step, call this new method. Calling CompilerEnv.step() with a list of actions still works, though with a deprecation warning. In the v0.2.4 release support for lists of actions in CompilerEnv.step() will be removed. Fixes facebookresearch#610.
cc @sogartar, @mostafaelhoushi, @hughleat Still WIP but looking for feedback on the approach |
Codecov Report
@@ Coverage Diff @@
## development #611 +/- ##
================================================
- Coverage 88.19% 38.26% -49.93%
================================================
Files 114 114
Lines 6708 6708
================================================
- Hits 5916 2567 -3349
- Misses 792 4141 +3349
Continue to review full report at Codecov.
|
aea65bf
to
0a0747a
Compare
This makes the following changes: - Changes env.step() `action` to accept only a single action, with a deprecation warning if a list of actions are provided. - Renames env.step() `observations` to `observation_spaces`. The old parameter name is still accepted with a deprecation warning. - Renames env.step() `rewards` to `reward_spaces`. The old parameter name is still accepted with a deprecation warning.
0a0747a
to
b1f9227
Compare
@ChrisCummins, it looks like you have a lot on your hands. Would you like me to finish this? |
That would be great, thanks! 🙂 Couple of heads up: Lemme know how you get on. Once this lands I'll make release v0.2.3. That'll unblock you guys from making the breaking change to Cheers, |
@ChrisCummins, isn't |
I may have misunderstood the intention, but isn't the main responsibility of |
Hi @sogartar,
Yep that's right.
The idea behind op = Action(env)
op.add_action(1)
op.add_action(2)
op.add_action(3)
result = op.eval() But in general, we try to make
The reason I refactored the wrappers to overload class MyWrapper(CompilerEnvWrapper):
def step(self, action, *args, **kwargs):
print(action) # do something interesting
return self.env.step(action, *args, **kwargs)
env = compiler_gym.make("llvm-v0")
env = MyWrapper(env)
env.step(1) # works as expected
env.multistep([1, 2, 3]) # oh dear :(
I'm not sure I follow. What is the advantage of having an additional Cheers, |
For a user of an environment the communication with the service is a detail that may change to use another mechanism. You can have multiple such mechanism coexisting for different environments. For example running an environment in the same process thread or using some other form of inter-process communication like memory-mapped files. Another network protocol is also possible. CompilerGym's environment interface is a superset over OpenAI Gym interface and it would be good to be cleanly separated from any implementations. |
def step(action, **kwargs):
return multistep([action], **kwargs) Then you only have to override |
You're right, Cheers, |
Superseded by #627. |
CompilerEnv.step()
currently accepts two types for the "action" argument:(1) a scalar action:
(2) an iterable of actions:
This PR splits this overloaded behavior into two methods:
CompilerEnv.step()
only takes a single action, andCompilerEnv.multistep()
only takes an iterable sequence of actions:For now, calling
CompilerEnv.step()
with a list of actions still works, though with a deprecation warning. In the v0.2.4 release support for lists of actions inCompilerEnv.step()
will be removed.Benefits
Passing a list of actions to execute in a single step enables them to be executed in a single RPC invocation, significantly reducing the overhead of round trips to the backend, and removing the need to calculate observation/rewards for each individual step. We measured speedups of ~3x on typical LLVM workloads using this (more details here).
Drawbacks
Adding a new method means we probably need to refactor all step wrappers to overload the
multistep()
wrappers, as otherwise this could be missed:Instead, we can require that wrappers that wish to change the behavior of an environments step function override the
raw_step()
method, as that method is the common denominator betweenstep()
andmultistep()
, and also has a tighter function signature with less room for mixing and matching different arg types.Fixes #610.