-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Moving sampling coordination for batch_mode=complete_episodes
to synchronous_parallel_sample
.
#46321
[RLlib] Moving sampling coordination for batch_mode=complete_episodes
to synchronous_parallel_sample
.
#46321
Conversation
…ot reducing workload when scaled and b) was using 'train_batch_size' neglecting 'train_batch_size_per_learner'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
synchronous_parallel_sample
.synchronous_parallel_sample
.
# For complete episodes mode, sample as long as the number of timesteps | ||
# done is smaller than the `train_batch_size`. | ||
# For complete episodes mode, sample a single episode and | ||
# leave coordination of sampling to `synchronous_parallel_sample`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like this!
Can we add a small TODO comment here that this logic, currently handled by synchronous_parallel_sample
will eventually be moved fully into EnvRunnerGroup
? So from the algo, you would do:
if self.config.batch_mode == "complete_episodes"
self.env_runner_group.sample(num_timesteps=[batch size], complete_episodes=True)
something like this ^. Don't have to do this in this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. I would love this move!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice elegant cleanup PR. Thanks @simonsays1980 !
Just one comment line to be added.
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
synchronous_parallel_sample
.batch_mode=complete_episodes
to synchronous_parallel_sample
.
…th 'complete_episodes' sampling happens multiple times until the number of timesteps for the 'train_batch_size' is reached. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Why are these changes needed
When sampling complete episodes each
EnvRunner
sampledtrain_batch_size
before returning. This made sampling inefficient and led to long waiting times in case slow environments were used. Furthermore, scaling could not reduce the workload in sampling. This PR changes this and moves coordination of the sampling whencomplete_episodes
are needed fully tosynchronous_parallel_sample
that can coordinate better across allEnvRunner
s. This should reduce sampling duration linearly by the number ofEnvRunner
s chosen.Related issue number
Closes #45826
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.