Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify updating state #224

Merged
merged 12 commits into from
Sep 22, 2020
Merged

Conversation

rocknamx8
Copy link
Contributor

@rocknamx8 rocknamx8 commented Sep 21, 2020

  • I have marked all applicable categories:
    • exception-raising fix
    • algorithm implementation fix
    • documentation modification
    • new feature
  • If applicable, I have mentioned the relevant/related issue(s)

Less important but also useful:

  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

Add an indicator(i.e. self.learning) of learning will be convenient for distinguishing state of policy.
Meanwhile, the state of self.training will be undisputed in the training stage.
Related issue: #211

Others:

  • fix a bug in DDQN: target_q could not be sampled from np.random.rand
  • fix a bug in DQN atari net: it should add a ReLU before the last layer
  • fix a bug in collector timing

@Trinkle23897 Trinkle23897 linked an issue Sep 21, 2020 that may be closed by this pull request
8 tasks
@Trinkle23897 Trinkle23897 changed the title Add learning state clarify learning state Sep 21, 2020
@Trinkle23897 Trinkle23897 linked an issue Sep 21, 2020 that may be closed by this pull request
8 tasks
Copy link
Contributor Author

@rocknamx8 rocknamx8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the action in _target_q of dqn may still controversial.

tianshou/policy/base.py Outdated Show resolved Hide resolved
@Trinkle23897 Trinkle23897 changed the title clarify learning state clarify updating state Sep 22, 2020
@Trinkle23897 Trinkle23897 merged commit bf39b9e into thu-ml:master Sep 22, 2020
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
Add an indicator(i.e. `self.learning`) of learning will be convenient for distinguishing state of policy.
Meanwhile, the state of `self.training` will be undisputed in the training stage.
Related issue: thu-ml#211 

Others:
- fix a bug in DDQN: target_q could not be sampled from np.random.rand
- fix a bug in DQN atari net: it should add a ReLU before the last layer
- fix a bug in collector timing

Co-authored-by: n+e <463003665@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants