Adding Average Reward PPO proposal #210

Howuhh · 2022-06-20T15:35:23Z

Although it is now common to solve most problems using discounted reward, this does not always correspond to the real problem (not episodic, long-horizon), where it is important to use algorithms that optimize the average reward.

There are only two adaptations of modern algorithms for average-reward setting: A-TRPO, A-PPO. A-PPO is almost the same as regular PPO and much easier to implement than A-TRPO. Besides, it also solves some problems [1, 2] of regular PPO (e.g. sampling from undiscounted state distribution).

In my free time, I tried to reproduce the results of the paper and it kinda worked. Nevertheless, I think it is important to compare it with the other algorithms, esp PPO. This will be easier to do if they share the same code base and common hacks. So it seems to me that adding it to cleanrl will help with this. In addition, I think average reward setting is very underrated and this will help popularize it (if A-PPO really works).

vwxyzjn · 2022-06-20T15:43:51Z

This looks pretty interesting - APPO would be a new algorithm. I think it's great to have it. Would you be up to going through the new algorithm contribution checklist? See #186 as an example.

Howuhh · 2022-06-20T15:50:14Z

Yes, but it will take some time, especially documentation and testing. I think it would be reasonable to start from apo_continuous_action.py and compare it only with ppo_continuous_action.py as a test.

vwxyzjn · 2022-06-20T15:50:58Z

That makes sense. Thank you!

Howuhh mentioned this issue Jun 21, 2022

Average PPO implementation #212

Closed

19 tasks

vwxyzjn closed this as completed Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Average Reward PPO proposal #210

Adding Average Reward PPO proposal #210

Howuhh commented Jun 20, 2022 •

edited

Loading

vwxyzjn commented Jun 20, 2022

Howuhh commented Jun 20, 2022 •

edited

Loading

vwxyzjn commented Jun 20, 2022

Adding Average Reward PPO proposal #210

Adding Average Reward PPO proposal #210

Comments

Howuhh commented Jun 20, 2022 • edited Loading

vwxyzjn commented Jun 20, 2022

Howuhh commented Jun 20, 2022 • edited Loading

vwxyzjn commented Jun 20, 2022

Howuhh commented Jun 20, 2022 •

edited

Loading

Howuhh commented Jun 20, 2022 •

edited

Loading