Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] (V1.1+) Unified framework for multi-goal algorithms #297

Closed
1 task done
ferreirajoaouerj opened this issue Jan 19, 2021 · 5 comments
Closed
1 task done
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ferreirajoaouerj
Copy link

I saw that dict support is almost ready, I think it's worth considering this framework, for V1.1+, before a new HER implementation appears.

All credits goes to @spitis, @Takonan, et al., for proposing this framework in the awesome paper Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning.

🚀 Feature

Most of the algorithms that emerged after HER differ in 3 functions:

  1. select(*args): chooses which desired_goal will be used by the agent in each episode (HER uses the desired_goal given by the environment).
  2. reward(state, action, achieved_goal): how to calculate the reward (HER uses the compute_reward method directly).
  3. relabel(buffer, *args): how data/trajectory augmentation will occur (HER's best strategy uses future achieved_goals).

Each of this components have massive consequences in the algorithm performance (see figure attached), especially for long-horizon tasks.

Motivation

It can be easy to implement new multi-goal algorithms just by changing these 3 functions.

Pitch

Implement the API through functions or methods.

Alternatives

  1. Keep it as it is.
  2. Implement a subset (i.e. don't mess with reward).

Additional context

The figure below is an experiment (source: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning) comparing different algorithms for a maze navigation task.

image

### Checklist

  • I have checked that there is no similar issue in the repo (required)
@ferreirajoaouerj ferreirajoaouerj added the enhancement New feature or request label Jan 19, 2021
@araffin araffin self-assigned this Jan 20, 2021
@araffin
Copy link
Member

araffin commented Jan 20, 2021

Hello,

thank you for raising this issue.
I'm aware of the MEGA paper (although I did not read it thoroughly) and that would be a good addition for the contrib repo ;)

Most of the algorithms that emerged after HER differ in 3 functions:

this would be part of the buffer no? As the rest would look the same from the algorithm perspective (as we are only relabeling transitions).

select(*args): chooses which desired_goal will be used by the agent in each episode (HER uses the desired_goal given by the environment).

could you be more explicit about what is included in *args? (and what could be returned, I assume an existing goal)

reward(state, action, achieved_goal): how to calculate the reward (HER uses the compute_reward method directly).

we should also include the info dict. achieved_goal is included in the state (by definition of a GoalEnv)

relabel(buffer, *args): how data/trajectory augmentation will occur (HER's best strategy uses future achieved_goals).

I'm not sure if buffer should be the argument, and what do you want to change in addition to the desired_goal and reward?

See current re-labeling here: https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/her/her_replay_buffer.py#L265

Pinging @megan-klaiber who worked on the first HER implementation, in case there is something that I missed ;)

@ferreirajoaouerj
Copy link
Author

Thank YOU! I love SB, happy to assist any way.

this would be part of the buffer no? As the rest would look the same from the algorithm perspective (as we are only relabeling transitions).

Yes. I suspect the differences would occur only on the collect_rollouts method, where we would process the initial episode observation, changing the desired_goal received (if algo is not HER) using the goal returned by select. Later we would relabel the transitions. The functions train and learn should be equal to HER for all other algorithms.

could you be more explicit about what is included in *args? (and what could be returned, I assume an existing goal)

Unfortunately, the goal selected (and returned by the function) can be any of the goal space, so there are a number of ways to select this desired_goal. Some algorithms uses GAN to propose them, others learn density models or distance functions. A lot of them uses past achieved_goals, but that would be already be available in the replay buffer (if select is a method), so args would be none.

we should also include the info dict. achieved_goal is included in the state (by definition of a GoalEnv)

Yes, that should be better.

I'm not sure if buffer should be the argument, and what do you want to change in addition to the desired_goal and reward?

I think it's fine these two changes!

One idea is to implement the HER algorithm as child of OffPolicyAlgorithm and add the methods proposed. Other algorithms could inherit from HER, and override the select, reward and relabel methods, keeping the rest equal.

@araffin
Copy link
Member

araffin commented Jan 22, 2021

Some algorithms uses GAN to propose them,

I see, I remember seeing such paper (I think using a VAE), so you can select a goal that may not exist, or that is not present in the replay buffer.

One idea is to implement the HER algorithm as child of OffPolicyAlgorithm and add the methods proposed.

With the new implementation, HER will be a child of BaseAlgorithm (we tried in the past to be a child of OffPolicyAlgorithm but encountered different issues because of the nature of SAC, which is just a wrapper around an existing off-policy algorithm), however, other algorithm may still inherit from it and from its replay buffer (which is the main thing that is going to change).

@araffin araffin added this to the v1.1 milestone Mar 5, 2021
@araffin
Copy link
Member

araffin commented Mar 17, 2021

Linking the PR of the HER refactor here: #351
This would make things much simpler as HER would be just a special replay buffer.

@araffin araffin mentioned this issue Mar 24, 2021
18 tasks
@araffin
Copy link
Member

araffin commented May 11, 2021

Closing this as #351 (comment) is now merged with master

@araffin araffin closed this as completed May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants