Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor PG algorithm and change behavior of compute_episodic_return #319

Merged
merged 12 commits into from
Mar 23, 2021

Conversation

ChenDRAG
Copy link
Collaborator

See #307 , #317 for details. Note that I remove 'rew_norm' option from compute_episodic_return() because latter I will redefine how to do value normalization(different algorithm may need different rew_norm schedule?)

tianshou/policy/base.py Outdated Show resolved Hide resolved
tianshou/policy/modelfree/a2c.py Outdated Show resolved Hide resolved
tianshou/policy/modelfree/a2c.py Outdated Show resolved Hide resolved
@Trinkle23897 Trinkle23897 linked an issue Mar 23, 2021 that may be closed by this pull request
@Trinkle23897 Trinkle23897 requested a review from danagi March 23, 2021 08:56
@Trinkle23897 Trinkle23897 merged commit e27b5a2 into thu-ml:master Mar 23, 2021
@ChenDRAG ChenDRAG deleted the rename_vpg branch March 23, 2021 14:18
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
thu-ml#319)

- simplify code
- apply value normalization (global) and adv norm (per-batch) in on-policy algorithms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants