-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] (V1.1+) Unified framework for multi-goal algorithms #297
Comments
Hello, thank you for raising this issue.
this would be part of the buffer no? As the rest would look the same from the algorithm perspective (as we are only relabeling transitions).
could you be more explicit about what is included in
we should also include the
I'm not sure if See current re-labeling here: https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/her/her_replay_buffer.py#L265 Pinging @megan-klaiber who worked on the first HER implementation, in case there is something that I missed ;) |
Thank YOU! I love SB, happy to assist any way.
Yes. I suspect the differences would occur only on the
Unfortunately, the goal selected (and returned by the function) can be any of the goal space, so there are a number of ways to select this
Yes, that should be better.
I think it's fine these two changes! One idea is to implement the HER algorithm as child of |
I see, I remember seeing such paper (I think using a VAE), so you can select a goal that may not exist, or that is not present in the replay buffer.
With the new implementation, |
Linking the PR of the HER refactor here: #351 |
Closing this as #351 (comment) is now merged with master |
I saw that
dict
support is almost ready, I think it's worth considering this framework, for V1.1+, before a newHER
implementation appears.All credits goes to @spitis, @Takonan, et al., for proposing this framework in the awesome paper Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning.
🚀 Feature
Most of the algorithms that emerged after HER differ in 3 functions:
desired_goal
will be used by the agent in each episode (HER uses thedesired_goal
given by the environment).compute_reward
method directly).achieved_goals
).Each of this components have massive consequences in the algorithm performance (see figure attached), especially for long-horizon tasks.
Motivation
It can be easy to implement new multi-goal algorithms just by changing these 3 functions.
Pitch
Implement the API through functions or methods.
Alternatives
Additional context
The figure below is an experiment (source: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning) comparing different algorithms for a maze navigation task.
### Checklist
The text was updated successfully, but these errors were encountered: