Refactor PG algorithm and change behavior of `compute_episodic_return` #319

ChenDRAG · 2021-03-22T14:00:49Z

See #307 , #317 for details. Note that I remove 'rew_norm' option from compute_episodic_return() because latter I will redefine how to do value normalization(different algorithm may need different rew_norm schedule?)

tianshou/policy/base.py

tianshou/policy/modelfree/a2c.py

minor

thu-ml#319) - simplify code - apply value normalization (global) and adv norm (per-batch) in on-policy algorithms

ChenDRAG added 5 commits March 22, 2021 21:34

refactor vpg & caculate_episodic_returns behavior

8b74f82

fix bug

66c102d

change docstring

52203ce

pep8 fix

fa18832

pep8fix

b02c9d2

Trinkle23897 reviewed Mar 23, 2021

View reviewed changes

tianshou/policy/base.py Outdated Show resolved Hide resolved

tianshou/policy/modelfree/a2c.py Outdated Show resolved Hide resolved

tianshou/policy/modelfree/a2c.py Outdated Show resolved Hide resolved

ChenDRAG and others added 6 commits March 23, 2021 14:21

fix test

8902852

fix test & pep8

dc1312d

Update base.py

f278a27

minor

fix mypy

0b7d76f

fix test

b0e7f5c

fix bug

f11a76a

Trinkle23897 linked an issue Mar 23, 2021 that may be closed by this pull request

Suggestion: Abandon name 'vpg' but use REINFORCE to replace it #317

Closed

Trinkle23897 requested a review from danagi March 23, 2021 08:56

update

991b4e0

Trinkle23897 approved these changes Mar 23, 2021

View reviewed changes

Trinkle23897 merged commit e27b5a2 into thu-ml:master Mar 23, 2021

ChenDRAG deleted the rename_vpg branch March 23, 2021 14:18

Trinkle23897 linked an issue Apr 21, 2021 that may be closed by this pull request

Plans of releasing mujoco benchmark of onpolicy algorithms(VPG, A2C, PPO) #307

Closed

BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024

Refactor PG algorithm and change behavior of compute_episodic_return (

44ce78b

thu-ml#319) - simplify code - apply value normalization (global) and adv norm (per-batch) in on-policy algorithms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor PG algorithm and change behavior of `compute_episodic_return` #319

Refactor PG algorithm and change behavior of `compute_episodic_return` #319

ChenDRAG commented Mar 22, 2021

Refactor PG algorithm and change behavior of compute_episodic_return #319

Refactor PG algorithm and change behavior of compute_episodic_return #319

Conversation

ChenDRAG commented Mar 22, 2021

Refactor PG algorithm and change behavior of `compute_episodic_return` #319

Refactor PG algorithm and change behavior of `compute_episodic_return` #319