-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] - Add env and agent steps in custom evaluation function for comformity with metrics logger. #45652
[RLlib] - Add env and agent steps in custom evaluation function for comformity with metrics logger. #45652
Conversation
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…fter the 'env.reset' instead before. The docstring in the callback clearly states, it should come before the reset. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… conform to metrics logger in 'Algorithm.evaluate. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
rllib/env/multi_agent_env_runner.py
Outdated
# Create a new multi-agent episode. | ||
_episode = self._new_episode() | ||
self._make_on_episode_callback("on_episode_created", _episode) | ||
_shared_data = { | ||
"agent_to_module_mapping_fn": self.config.policy_mapping_fn, | ||
} | ||
|
||
# Reset the environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove from this PR. I think it belongs to another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly do you mean? The comment was there before.
) = self.config.custom_evaluation_function(self, self.evaluation_workers) | ||
if not env_steps or not agent_steps: | ||
raise ValueError( | ||
"Custom eval function must return " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we give the user the exact expected return signature here?
Tuple[ResultDict, int, int]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me. Thanks @simonsays1980 for the fix.
Some thoughts on the entire problem (10k feet view):
In the mid-run, we should probably change to a world, in which we don't return anything anymore from any "core" method and instead, just log stuff inside these methods to the metrics logger:
- algo.training_step() -> Algo should call metrics.reduce() later and not visible to the user
- Learner.update() -> Already done and hidden from user
- Learner.additional_update() -> method should probably disappear entirely
- EnvRunner.sample/get_metrics -> should probably be combined and metrics.reduce call should be automated under the hood (hidden from user).
This way, the user would never have to deal with when to call reduce
and would simply log their metrics, plain and simple.
Also, we need to unify our timesteps structures:
env_steps_lifetime: [some count]
agent_steps_lifetime/
[agent 1]: [some count]
[agent 2]: [some count]
module_steps_lifetime/
[module 1]: [some count]
Hiding the |
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
… is already defined in the 'test_utils'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…ictionaries. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Why are these changes needed?
Algorithm.evaluate()
evaluates a custom evaluation function in the new stack, but then addsenv_steps=agent_steps=0
to theNUM_ENV_STEPS_SAMPLED_LIFETIME
,NUM_AGENT_STEPS_SAMPLES_LIFETIME
b/c all other evaluation methods returnenv_steps
andagent_steps
together with results.This PR modifies the return type of the custom evalaution fucntion to
metrics, env_steps, agent_steps
in the new api stack (i.e.enable_env_runner_and_connector_v2=True
).Related issue number
Closes #44595
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.