[Feature Request] (V1.1+) Unified framework for multi-goal algorithms #297

ferreirajoaouerj · 2021-01-19T23:57:02Z

I saw that dict support is almost ready, I think it's worth considering this framework, for V1.1+, before a new HER implementation appears.

All credits goes to @spitis, @Takonan, et al., for proposing this framework in the awesome paper Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning.

🚀 Feature

Most of the algorithms that emerged after HER differ in 3 functions:

select(*args): chooses which desired_goal will be used by the agent in each episode (HER uses the desired_goal given by the environment).
reward(state, action, achieved_goal): how to calculate the reward (HER uses the compute_reward method directly).
relabel(buffer, *args): how data/trajectory augmentation will occur (HER's best strategy uses future achieved_goals).

Each of this components have massive consequences in the algorithm performance (see figure attached), especially for long-horizon tasks.

Motivation

It can be easy to implement new multi-goal algorithms just by changing these 3 functions.

Pitch

Implement the API through functions or methods.

Alternatives

Keep it as it is.
Implement a subset (i.e. don't mess with reward).

Additional context

The figure below is an experiment (source: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning) comparing different algorithms for a maze navigation task.

### Checklist

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

araffin · 2021-01-20T15:52:07Z

Hello,

thank you for raising this issue.
I'm aware of the MEGA paper (although I did not read it thoroughly) and that would be a good addition for the contrib repo ;)

Most of the algorithms that emerged after HER differ in 3 functions:

this would be part of the buffer no? As the rest would look the same from the algorithm perspective (as we are only relabeling transitions).

select(*args): chooses which desired_goal will be used by the agent in each episode (HER uses the desired_goal given by the environment).

could you be more explicit about what is included in *args? (and what could be returned, I assume an existing goal)

reward(state, action, achieved_goal): how to calculate the reward (HER uses the compute_reward method directly).

we should also include the info dict. achieved_goal is included in the state (by definition of a GoalEnv)

relabel(buffer, *args): how data/trajectory augmentation will occur (HER's best strategy uses future achieved_goals).

I'm not sure if buffer should be the argument, and what do you want to change in addition to the desired_goal and reward?

See current re-labeling here: https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/her/her_replay_buffer.py#L265

Pinging @megan-klaiber who worked on the first HER implementation, in case there is something that I missed ;)

ferreirajoaouerj · 2021-01-20T19:24:36Z

Thank YOU! I love SB, happy to assist any way.

this would be part of the buffer no? As the rest would look the same from the algorithm perspective (as we are only relabeling transitions).

Yes. I suspect the differences would occur only on the collect_rollouts method, where we would process the initial episode observation, changing the desired_goal received (if algo is not HER) using the goal returned by select. Later we would relabel the transitions. The functions train and learn should be equal to HER for all other algorithms.

could you be more explicit about what is included in *args? (and what could be returned, I assume an existing goal)

Unfortunately, the goal selected (and returned by the function) can be any of the goal space, so there are a number of ways to select this desired_goal. Some algorithms uses GAN to propose them, others learn density models or distance functions. A lot of them uses past achieved_goals, but that would be already be available in the replay buffer (if select is a method), so args would be none.

we should also include the info dict. achieved_goal is included in the state (by definition of a GoalEnv)

Yes, that should be better.

I'm not sure if buffer should be the argument, and what do you want to change in addition to the desired_goal and reward?

I think it's fine these two changes!

One idea is to implement the HER algorithm as child of OffPolicyAlgorithm and add the methods proposed. Other algorithms could inherit from HER, and override the select, reward and relabel methods, keeping the rest equal.

araffin · 2021-01-22T16:25:28Z

Some algorithms uses GAN to propose them,

I see, I remember seeing such paper (I think using a VAE), so you can select a goal that may not exist, or that is not present in the replay buffer.

One idea is to implement the HER algorithm as child of OffPolicyAlgorithm and add the methods proposed.

With the new implementation, HER will be a child of BaseAlgorithm (we tried in the past to be a child of OffPolicyAlgorithm but encountered different issues because of the nature of SAC, which is just a wrapper around an existing off-policy algorithm), however, other algorithm may still inherit from it and from its replay buffer (which is the main thing that is going to change).

araffin · 2021-03-17T14:50:59Z

Linking the PR of the HER refactor here: #351
This would make things much simpler as HER would be just a special replay buffer.

araffin · 2021-05-11T10:32:38Z

Closing this as #351 (comment) is now merged with master

ferreirajoaouerj added the enhancement New feature or request label Jan 19, 2021

araffin self-assigned this Jan 20, 2021

araffin added this to the v1.1 milestone Mar 5, 2021

araffin mentioned this issue Mar 24, 2021

Refactor HER #351

Merged

18 tasks

araffin closed this as completed May 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] (V1.1+) Unified framework for multi-goal algorithms #297

[Feature Request] (V1.1+) Unified framework for multi-goal algorithms #297

ferreirajoaouerj commented Jan 19, 2021

araffin commented Jan 20, 2021

ferreirajoaouerj commented Jan 20, 2021

araffin commented Jan 22, 2021

araffin commented Mar 17, 2021

araffin commented May 11, 2021

[Feature Request] (V1.1+) Unified framework for multi-goal algorithms #297

[Feature Request] (V1.1+) Unified framework for multi-goal algorithms #297

Comments

ferreirajoaouerj commented Jan 19, 2021

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

araffin commented Jan 20, 2021

ferreirajoaouerj commented Jan 20, 2021

araffin commented Jan 22, 2021

araffin commented Mar 17, 2021

araffin commented May 11, 2021