Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] - Add env and agent steps in custom evaluation function for comformity with metrics logger. #45652

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c748df8
Changed comment.
simonsays1980 May 10, 2024
6409007
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 13, 2024
d2f9030
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 14, 2024
a3416a8
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 15, 2024
8582ad9
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 16, 2024
b565f34
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 21, 2024
c0eed1f
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 22, 2024
341cb95
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 22, 2024
b76807f
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 24, 2024
af9c9e9
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 27, 2024
e422c42
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 28, 2024
26e0926
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 29, 2024
562f586
Merge branch 'master' of https://github.com/ray-project/ray
simonsays1980 May 31, 2024
b95848c
Fixed a minor bug that was calling the calback 'on_episode_created' a…
simonsays1980 May 31, 2024
ca98704
Modified callback order in 'MultiAgentEnvRunner'.
simonsays1980 May 31, 2024
faffb1d
Added 'env_steps' and 'agent_steps' to new-stack custom evalaution to…
simonsays1980 May 31, 2024
8a73127
Fixed small bug to conform to old API stack.
simonsays1980 May 31, 2024
01d0e2a
Added @sven1977's review.
simonsays1980 Jun 24, 2024
6e7bae1
Merge branch 'master' into fix-env-and-agent-steps-in-custom-evaluati…
simonsays1980 Jun 24, 2024
a70bf7a
Adapted custom evaluation example to new return values.
simonsays1980 Jun 24, 2024
e75031a
Removed the parser argument '--evaluation-parallel-to-training' as it…
simonsays1980 Jun 25, 2024
29af6aa
Fixed a bug in 'custom_evaluation.py' due to new metrics logging of d…
simonsays1980 Jun 25, 2024
c56302b
Merge branch 'master' into fix-env-and-agent-steps-in-custom-evaluati…
simonsays1980 Jun 25, 2024
54fcbf0
Readded parser argument.
simonsays1980 Jun 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 25 additions & 6 deletions rllib/algorithms/algorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -1006,7 +1006,14 @@ def evaluate(

# We will use a user provided evaluation function.
if self.config.custom_evaluation_function:
eval_results = self._evaluate_with_custom_eval_function()
if self.config.enable_env_runner_and_connector_v2:
(
eval_results,
env_steps,
agent_steps,
) = self._evaluate_with_custom_eval_function()
else:
eval_results = self.config.custom_evaluation_function()
# There is no eval EnvRunnerGroup -> Run on local EnvRunner.
elif self.evaluation_workers is None:
(
Expand Down Expand Up @@ -1103,20 +1110,32 @@ def evaluate(
# Also return the results here for convenience.
return eval_results

def _evaluate_with_custom_eval_function(self):
def _evaluate_with_custom_eval_function(self) -> Tuple[ResultDict, int, int]:
logger.info(
f"Evaluating current state of {self} using the custom eval function "
f"{self.config.custom_evaluation_function}"
)
eval_results = self.config.custom_evaluation_function(
self, self.evaluation_workers
)
if self.config.enable_env_runner_and_connector_v2:
(
eval_results,
env_steps,
agent_steps,
) = self.config.custom_evaluation_function(self, self.evaluation_workers)
if not env_steps or not agent_steps:
raise ValueError(
"Custom eval function must return "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we give the user the exact expected return signature here?

Tuple[ResultDict, int, int]?

"`Tuple[ResultDict, int, int]` with `int, int` being "
f"`env_steps` and `agent_steps`! Got {env_steps}, {agent_steps}."
)
else:
eval_results = self.config.custom_evaluation_function()
if not eval_results or not isinstance(eval_results, dict):
raise ValueError(
"Custom eval function must return "
f"dict of metrics! Got {eval_results}."
)
return eval_results

return eval_results, env_steps, agent_steps

def _evaluate_on_local_env_runner(self, env_runner):
if hasattr(env_runner, "input_reader") and env_runner.input_reader is None:
Expand Down
5 changes: 4 additions & 1 deletion rllib/algorithms/algorithm_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2297,7 +2297,10 @@ def evaluation(
for training.
custom_evaluation_function: Customize the evaluation method. This must be a
function of signature (algo: Algorithm, eval_workers: EnvRunnerGroup) ->
metrics: dict. See the Algorithm.evaluate() method to see the default
(metrics: dict, env_steps: int, agent_steps: int) (metrics: dict if
`enable_env_runner_and_connector_v2=True`), where `env_steps` and
`agent_steps` define the number of sampled steps during the evaluation
iteration. See the Algorithm.evaluate() method to see the default
implementation. The Algorithm guarantees all eval workers have the
latest policy state before this function is called.
always_attach_evaluation_results: Make sure the latest available evaluation
Expand Down
28 changes: 20 additions & 8 deletions rllib/examples/evaluation/custom_evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@
| 26.1973 | 16000 | 0.872034 | 13.7966 |
+------------------+-------+----------+--------------------+
"""
from typing import Tuple

from ray.air.constants import TRAINING_ITERATION
from ray.rllib.algorithms.algorithm import Algorithm
from ray.rllib.algorithms.algorithm_config import AlgorithmConfig
Expand Down Expand Up @@ -94,7 +96,7 @@
def custom_eval_function(
algorithm: Algorithm,
eval_workers: EnvRunnerGroup,
) -> ResultDict:
) -> Tuple[ResultDict, int, int]:
"""Example of a custom evaluation function.

Args:
Expand Down Expand Up @@ -122,7 +124,7 @@ def custom_eval_function(
# Collect metrics results collected by eval workers in this list for later
# processing.
env_runner_metrics = []

sampled_episodes = []
# For demonstration purposes, run through some number of evaluation
# rounds within this one call. Note that this function is called once per
# training iteration (`Algorithm.train()` call) OR once per `Algorithm.evaluate()`
Expand All @@ -131,13 +133,20 @@ def custom_eval_function(
print(f"Training iteration {algorithm.iteration} -> evaluation round {i}")
# Sample episodes from the EnvRunners AND have them return only the thus
# collected metrics.
metrics_all_env_runners = eval_workers.foreach_worker(
episodes_and_metrics_all_env_runners = eval_workers.foreach_worker(
# Return only the metrics, NOT the sampled episodes (we don't need them
# anymore).
func=lambda worker: (worker.sample(), worker.get_metrics())[1],
func=lambda worker: (worker.sample(), worker.get_metrics()),
local_worker=False,
)
env_runner_metrics.extend(metrics_all_env_runners)
sampled_episodes.extend(
eps
for eps_and_mtrcs in episodes_and_metrics_all_env_runners
for eps in eps_and_mtrcs[0]
)
env_runner_metrics.extend(
eps_and_mtrcs[1] for eps_and_mtrcs in episodes_and_metrics_all_env_runners
)

# You can compute metrics from the episodes manually, or use the Algorithm's
# convenient MetricsLogger to store all evaluation metrics inside the main
Expand All @@ -148,17 +157,20 @@ def custom_eval_function(
eval_results = algorithm.metrics.reduce(
key=(EVALUATION_RESULTS, ENV_RUNNER_RESULTS)
)

# Alternatively, you could manually reduce over the n returned `env_runner_metrics`
# dicts, but this would be much harder as you might not know, which metrics
# to sum up, which ones to average over, etc..

return eval_results
# Compute env and agent steps from sampled episodes.
env_steps = sum(eps.env_steps() for eps in sampled_episodes)
agent_steps = sum(eps.agent_steps() for eps in sampled_episodes)

return eval_results, env_steps, agent_steps


if __name__ == "__main__":
args = parser.parse_args()

args.local_mode = True
base_config = (
get_trainable_cls(args.algo)
.get_default_config()
Expand Down
Loading