Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Average Reward PPO proposal #210

Closed
Howuhh opened this issue Jun 20, 2022 · 3 comments
Closed

Adding Average Reward PPO proposal #210

Howuhh opened this issue Jun 20, 2022 · 3 comments

Comments

@Howuhh
Copy link
Contributor

Howuhh commented Jun 20, 2022

Although it is now common to solve most problems using discounted reward, this does not always correspond to the real problem (not episodic, long-horizon), where it is important to use algorithms that optimize the average reward.

There are only two adaptations of modern algorithms for average-reward setting: A-TRPO, A-PPO. A-PPO is almost the same as regular PPO and much easier to implement than A-TRPO. Besides, it also solves some problems [1, 2] of regular PPO (e.g. sampling from undiscounted state distribution).

In my free time, I tried to reproduce the results of the paper and it kinda worked. Nevertheless, I think it is important to compare it with the other algorithms, esp PPO. This will be easier to do if they share the same code base and common hacks. So it seems to me that adding it to cleanrl will help with this. In addition, I think average reward setting is very underrated and this will help popularize it (if A-PPO really works).

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jun 20, 2022

This looks pretty interesting - APPO would be a new algorithm. I think it's great to have it. Would you be up to going through the new algorithm contribution checklist? See #186 as an example.

@Howuhh
Copy link
Contributor Author

Howuhh commented Jun 20, 2022

Yes, but it will take some time, especially documentation and testing. I think it would be reasonable to start from apo_continuous_action.py and compare it only with ppo_continuous_action.py as a test.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jun 20, 2022

That makes sense. Thank you!

@Howuhh Howuhh mentioned this issue Jun 21, 2022
19 tasks
@vwxyzjn vwxyzjn closed this as completed Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants