[Feature] Dispatch for DDPG loss module #1215

Blonck · 2023-06-01T07:47:09Z

Description

Enable dispatching arguments for the .forward() method of the DDPG loss module.

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

Blonck · 2023-06-01T08:19:31Z

Open Questions

This pull request raises some open questions regarding the patch. Some of these questions may also apply to other loss modules.

Currently, the PR breaks backward compatibility by renaming the argument input_tensordict to tensordict. This renaming is necessary for @dispatch to function properly.
Potential solutions:

Accept the change and standardize the argument name. It is unlikely that anyone is using keyword arguments to call .forward() (my preferred option).
Enhance the functionality of @dispatch to allow alternative input names. This could involve adding a parameter that, when set, skips the name validation.

The underlying value estimator relies on optional input keys, such as steps_to_next_obs, which may be used by .value_estimate() if not None. The current DQNLoss implementation ignores these keys:

class DQNLoss(LossModule):
     ...

    @dispatch(
        source=[
            "observation",
            ("next", "observation"),
            "action",
            ("next", "reward"),
            ("next", "done"),
        ],
        dest=["loss"],
    )
    def forward(self, tensordict: TensorDictBase) -> TensorDict:
        ...

Possible solutions include:

Simply ignore optional arguments.
Add optional tensor dict keys to the dispatch argument list, requiring users to provide the argument (which can be set to None).
Extend @dispatch to support optional arguments.

Some tensor dict keys used by the value estimator, like ("next", "reward") and ("next", "done"), have fixed values despite the advantage module allowing configuration. These keys are also input keys for .forward().
Should they be made configurable by adding them to _AcceptedKeys of the loss module in this PR?
Sometimes input/output tensordict keys are dynamic and may depend on the configuration of the loss module. For DDPG loss this is not the case but for example the output keys of the A2CLoss depends on configuration.
One solution would be to use a @Property for that. In case of the A2CLoss it would look like:

    @property
    def out_keys(self):
        outs = ["loss_objective"]
        if self.critic_coef:
            outs.append("loss_critic")
        if self.entropy_bonus:
            outs.append("entropy")
            outs.append("loss_entropy")

        return outs

Do you see any problems with that approach?

Blonck · 2023-06-01T09:06:19Z

The underlying value estimator relies on optional input keys, such as steps_to_next_obs, which may be used by .value_estimate() if not None. The current DQNLoss implementation ignores these keys:

You can ignore this one. @dispatch does not allow any optional arguments, for now. So we need to go with the first solution and ignore optional arguments.

vmoens · 2023-06-01T10:43:52Z

Open Questions

This pull request raises some open questions regarding the patch. Some of these questions may also apply to other loss modules.

Currently, the PR breaks backward compatibility by renaming the argument input_tensordict to tensordict. This renaming is necessary for @dispatch to function properly.
Potential solutions:

Accept the change and standardize the argument name. It is unlikely that anyone is using keyword arguments to call .forward() (my preferred option).

Enhance the functionality of @dispatch to allow alternative input names. This could involve adding a parameter that, when set, skips the name validation.

I'm open to bc-breaking change in this case

vmoens · 2023-06-01T10:48:58Z

Simply ignore optional arguments.
Add optional tensor dict keys to the dispatch argument list, requiring users to provide the argument (which can be set to None).
Extend @dispatch to support optional arguments.

Two things here:
We can't assume that dispatch will have all the functionalities. I would keep things small scale if possible.
If there is a key whose presence or absence controls the flow, I think it's best to consider that it is absent for dispatch.
That being said, in some places in the code we do

value = data.get(key, None)
if value is None:
   foo()
else:
   bar()

In this case, we could have the user call

module(..., steps_to_next_obs=tensor)

which will populate the tensordict with that key, and if not the resulting value will be None.
That could require to change the function that reads "steps_to_next_obs".

In summary:
I would suggest to keep the behaviour simple and not include "steps_to_next_obs" in the in_keys, but keep track of this in an issue in TorchRL.

vmoens · 2023-06-01T10:50:13Z

Some tensor dict keys used by the value estimator, like ("next", "reward") and ("next", "done"), have fixed values despite the advantage module allowing configuration. These keys are also input keys for .forward().
Should they be made configurable by adding them to _AcceptedKeys of the loss module in this PR?

Yes I think they should be part of the _AcceptedKeys. Do you have an example of a place where this?

vmoens · 2023-06-01T10:51:01Z

Sometimes input/output tensordict keys are dynamic and may depend on the configuration of the loss module. For DDPG loss this is not the case but for example the output keys of the A2CLoss depends on configuration.
One solution would be to use a https://github.com/Property for that. In case of the A2CLoss it would look like:

Yes we should be using a property!

Blonck · 2023-06-01T10:57:32Z

Yes I think they should be part of the _AcceptedKeys. Do you have an example of a place where this?

For example ("next", "reward") or better ("next", self.tensor_keys.reward) is used in all advantages.

vmoens · 2023-06-01T15:13:39Z

That should be customisable

vmoens

Fantastic, really fancy!
On a high level: do you think that the in_keys will be recyclable across modules or will we need to re-code it every time?

torchrl/objectives/ddpg.py

test/test_cost.py

vmoens

Fantastic, really fancy!
On a high level: do you think that the in_keys will be recyclable across modules or will we need to re-code it every time?

Blonck · 2023-06-03T08:31:57Z

On a high level: do you think that the in_keys will be recyclable across modules or will we need to re-code it every time?

No sure if there is really something common to all loss modules here. For me it seems to dependent on what actually happens inside the loss to recycle something. But I will think about it, maybe I've got some idea.

vmoens

Final review

Dispatch .forward of ddpg loss module.

6f71893

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 1, 2023

Blonck requested a review from vmoens June 1, 2023 08:21

vmoens added the enhancement New feature or request label Jun 1, 2023

Blonck added 2 commits June 1, 2023 13:32

Add "reward" and "done" to configurable keys of DDPGLoss

20046a1

Merge branch 'main' into dispatch_ddpg

4066a6c

vmoens added the bc breaking backward compatibility breaking change label Jun 1, 2023

vmoens added 2 commits June 1, 2023 18:18

Merge remote-tracking branch 'origin/main' into dispatch_ddpg

ea5ce8a

amend

2dc9b89

vmoens reviewed Jun 1, 2023

View reviewed changes

torchrl/objectives/ddpg.py Show resolved Hide resolved

test/test_cost.py Outdated Show resolved Hide resolved

test/test_cost.py Outdated Show resolved Hide resolved

test/test_cost.py Outdated Show resolved Hide resolved

vmoens reviewed Jun 1, 2023

View reviewed changes

vmoens and others added 4 commits June 2, 2023 18:12

Merge remote-tracking branch 'origin/main' into dispatch_ddpg

718b8cb

Merge branch 'main' into dispatch_ddpg

6fa66e7

Use nested keys for DDPG tensordict key tests

e42cd86

Merge branch 'dispatch_ddpg' of github.com:Blonck/rl into dispatch_ddpg

aeda5ba

vmoens approved these changes Jun 4, 2023

View reviewed changes

vmoens merged commit 331f677 into pytorch:main Jun 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Dispatch for DDPG loss module #1215

[Feature] Dispatch for DDPG loss module #1215

Blonck commented Jun 1, 2023

Blonck commented Jun 1, 2023

Blonck commented Jun 1, 2023

vmoens commented Jun 1, 2023

Open Questions

vmoens commented Jun 1, 2023

vmoens commented Jun 1, 2023

vmoens commented Jun 1, 2023

Blonck commented Jun 1, 2023

vmoens commented Jun 1, 2023

vmoens left a comment

vmoens left a comment

Blonck commented Jun 3, 2023

vmoens left a comment

[Feature] Dispatch for DDPG loss module #1215

[Feature] Dispatch for DDPG loss module #1215

Conversation

Blonck commented Jun 1, 2023

Description

Types of changes

Checklist

Blonck commented Jun 1, 2023

Open Questions

Blonck commented Jun 1, 2023

vmoens commented Jun 1, 2023

Open Questions

vmoens commented Jun 1, 2023

vmoens commented Jun 1, 2023

vmoens commented Jun 1, 2023

Blonck commented Jun 1, 2023

vmoens commented Jun 1, 2023

vmoens left a comment

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

Blonck commented Jun 3, 2023

vmoens left a comment

Choose a reason for hiding this comment