Add C51 algorithm #266

shengxiang19 · 2020-12-25T08:04:20Z

Distributional RL algorithms are very powerful in atari environments. I am going to implement a series of typical algorithms, i.e. C51, QR-DQN, IQN, FQF, based on the reinforcement learning platform Tianshou.

This is my frist PR for C51algorithm: https://arxiv.org/abs/1707.06887

add C51 policy in tianshou/policy/modelfree/c51.py.
add C51 net in tianshou/utils/net/discrete.py.
add C51 atari example in examples/atari/atari_c51.py.
add C51 statement in tianshou/policy/init.py.
add C51 test in test/discrete/test_c51.py.
add C51 atari results in examples/atari/results/c51/.

By running "python3 atari_c51.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '20.50 ± 0.50', in epoch 9.

By running "python3 atari_c51.py --task "BreakoutNoFrameskip-v4" --n-step 1 --epoch 40", get best_reward: 407.400000 ± 31.155096 in epoch 39.

codecov-io · 2020-12-25T08:16:30Z

Codecov Report

Merging #266 (d315052) into master (5d13d8a) will decrease coverage by 0.56%.
The diff coverage is 75.53%.

@@            Coverage Diff             @@
##           master     #266      +/-   ##
==========================================
- Coverage   94.54%   93.98%   -0.57%     
==========================================
  Files          41       42       +1     
  Lines        2677     2760      +83     
==========================================
+ Hits         2531     2594      +63     
- Misses        146      166      +20

Flag	Coverage Δ
unittests	`93.98% <75.53%> (-0.57%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/policy/base.py	`73.26% <22.22%> (-3.03%)`	⬇️
tianshou/utils/net/discrete.py	`87.50% <30.00%> (-12.50%)`	⬇️
tianshou/utils/net/common.py	`97.33% <80.00%> (-2.67%)`	⬇️
tianshou/policy/modelfree/c51.py	`89.06% <89.06%> (ø)`
tianshou/policy/__init__.py	`100.00% <100.00%> (ø)`
tianshou/env/worker/subproc.py	`91.15% <0.00%> (-0.06%)`	⬇️
tianshou/data/collector.py	`95.97% <0.00%> (-0.03%)`	⬇️
tianshou/data/buffer.py	`99.03% <0.00%> (-0.01%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d13d8a...d315052. Read the comment docs.

Trinkle23897 · 2020-12-25T08:17:56Z

Nice job! Could you please also modify the README.md and docs/index.rst (add C51 description)?

Trinkle23897 · 2020-12-25T08:42:52Z

That's ok I can help you fix the PEP8.

shengxiang19 · 2020-12-25T08:46:18Z

Thank you very much. I will modify the README.md and docs/index.rst in my next PR.

Trinkle23897 · 2020-12-25T08:48:00Z

Thank you very much. I will modify the README.md and docs/index.rst in my next PR.

Just in this PR is okay.

shengxiang19 · 2020-12-25T08:54:55Z

Thank you very much. I will modify the README.md and docs/index.rst in my next PR.

Just in this PR is okay.

I'm not very good at GitHub, and I could not find how to do it in this PR.

Trinkle23897 · 2020-12-25T08:55:46Z

Okay, that's fine :) I'll take a look later on.

shengxiang19 · 2020-12-25T11:50:59Z

All checks have passed :) It was a tough journey.

Trinkle23897

Also, could you please add a test script for CartPole-v0 under test/discrete/?

tianshou/policy/modelfree/c51.py

Trinkle23897 · 2020-12-25T13:25:12Z

Will implement a test/discrete/test_c51.py in the next PR.

I think you can directly add this file. No need to make a separate PR.

shengxiang19 · 2020-12-25T13:28:19Z

Will implement a test/discrete/test_c51.py in the next PR.

I think you can directly add this file. No need to make a separate PR.

I hope to combine the results of C51 in a variety of atrai games with it in the future PR.

Trinkle23897 · 2020-12-25T13:32:30Z

I hope to combine the results of C51 in a variety of Atari games with it in the future PR.

Cool, and I think you can add these results here so that this PR can be a complete version of C51 implementation :)

shengxiang19 · 2020-12-25T13:36:57Z

I hope to combine the results of C51 in a variety of Atari games with it in the future PR.

Cool, and I think you can add these results here so that this PR can be a complete version of C51 implementation :)

I'm not sure how to add new files under the current PR. So, add a new PR is more convenient for me. In addition, I'm not quite sure when I can finish this work.

Trinkle23897 · 2020-12-25T13:40:36Z

I'm not sure how to add new files under the current PR. So, add a new PR is more convenient for me.

Just add the file in shengxiang19/C51 branch instead of here. See https://stackoverflow.com/questions/10147445/github-adding-commits-to-existing-pull-request

In addition, I'm not quite sure when I can finish this work.

I can wait for you.

shengxiang19 · 2020-12-25T13:43:22Z

I'm not sure how to add new files under the current PR. So, add a new PR is more convenient for me.

Just add the file in shengxiang19/C51 branch instead of here. See https://stackoverflow.com/questions/10147445/github-adding-commits-to-existing-pull-request

In addition, I'm not quite sure when I can finish this work.

I can wait for you.

Thank you. I can try it later.

fix bugs in pc51

shengxiang19 · 2020-12-26T07:27:54Z

Will implement a test/discrete/test_c51.py in the next PR.

I think you can directly add this file. No need to make a separate PR.

I have add a test_c51 for CartPole-v0 under test/discrete/. Hope you can help me check it.

shengxiang19 · 2020-12-27T12:54:31Z

I have add the results of C51 in three typical atari environments. My current plan of C51 is done.

Trinkle23897

I'll optimize the n_step code this week (in this PR). Thanks for your great work!

tianshou/utils/net/common.py

Trinkle23897

I'm running QbertNoFrameskip-v4 for evaluation. Now in epoch 31 it reaches 14047.
Please double-check my implementation.

tianshou/policy/modelfree/c51.py

duburcqa · 2021-01-06T09:51:05Z

tianshou/policy/modelfree/c51.py

+        """
+        model = getattr(self, model)
+        obs = batch[input]
+        obs_ = obs.obs if hasattr(obs, "obs") else obs


Why hasattr(obs, "obs") could be false ?

These three are the same as existing DQNPolicy. I guess we can make a separate PR to enhance these things :)

Yes I noticed that :)

duburcqa · 2021-01-06T09:54:14Z

tianshou/policy/modelfree/c51.py

+        dist, h = model(obs_, state=state, info=batch.info)
+        q = (dist * self.support).sum(2)
+        act: np.ndarray = to_numpy(q.max(dim=1)[1])
+        if hasattr(obs, "mask"):


I don't like much this approach, but right now I have no idea about to avoid it. Maybe adding masked_array method to Batch class to offer something similar to numpy's masked arrays. Internally it would use the same mechanism, but it would be hidden in Batch, which is way better in by opinion.

duburcqa · 2021-01-06T09:57:46Z

tianshou/policy/modelfree/c51.py

+        batch.weight = cross_entropy.detach()  # prio-buffer
+        loss.backward()
+        self.optim.step()
+        self._cnt += 1


I recommend explicit variable names _cnt

This is the PR for C51algorithm: https://arxiv.org/abs/1707.06887 1. add C51 policy in tianshou/policy/modelfree/c51.py. 2. add C51 net in tianshou/utils/net/discrete.py. 3. add C51 atari example in examples/atari/atari_c51.py. 4. add C51 statement in tianshou/policy/__init__.py. 5. add C51 test in test/discrete/test_c51.py. 6. add C51 atari results in examples/atari/results/c51/. By running "python3 atari_c51.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '20.50 ± 0.50', in epoch 9. By running "python3 atari_c51.py --task "BreakoutNoFrameskip-v4" --n-step 1 --epoch 40", get best_reward: 407.400000 ± 31.155096 in epoch 39.

shengxiang19 and others added 2 commits December 25, 2020 10:57

Add C51 algorithm

66c1f95

fix PEP8 fail

64f42ce

shengxiang19 added 2 commits December 25, 2020 16:30

Update c51.py

8e90593

Update c51.py

da9e369

shengxiang19 added 6 commits December 25, 2020 17:13

Update c51.py

8e82d7a

Update c51.py

3abf810

Update c51.py

631c461

Update c51.py

2b43654

Update c51.py

b0e7317

Update c51.py

3d96e3b

Trinkle23897 reviewed Dec 25, 2020

View reviewed changes

tianshou/policy/modelfree/c51.py Outdated Show resolved Hide resolved

simplify C51 network; modify readme

6688aef

shengxiang19 and others added 4 commits December 26, 2020 13:39

Update c51.py

a4e8750

add test_c51

e02dbc3

Update common.py

489ecaa

Update test_c51.py

0505b16

fix bugs in pc51

fix bugs in pc51

ca13004

add c51 atari results

3c6c4a7

Update c51.py

6116eb5

shengxiang19 requested a review from Trinkle23897 December 27, 2020 13:20

Trinkle23897 reviewed Dec 28, 2020

View reviewed changes

tianshou/utils/net/common.py Outdated Show resolved Hide resolved

shengxiang19 and others added 4 commits December 28, 2020 21:15

Add c51 Qbert result

c06fba1

Add c51 MsPacman result

92b2d4c

Add c51 Seaquest and SpaceInvaders results

44cf066

nstep multidim support

3695f12

Trinkle23897 reviewed Jan 4, 2021

View reviewed changes

tianshou/policy/modelfree/c51.py Show resolved Hide resolved

fix nstep rew reshape

22fa78a

Trinkle23897 requested a review from duburcqa January 4, 2021 04:46

Trinkle23897 added 3 commits January 4, 2021 16:21

merge categoricalNet to Net

3efac01

change self.support to nn.Parameter

d315052

improve coverage

dbbfb7d

Trinkle23897 approved these changes Jan 6, 2021

View reviewed changes

Trinkle23897 merged commit c6f2648 into thu-ml:master Jan 6, 2021

duburcqa reviewed Jan 6, 2021

View reviewed changes

shengxiang19 deleted the c51 branch January 10, 2021 23:56

Trinkle23897 mentioned this pull request Jun 10, 2021

How to support multi-agent reinforcement learning #121

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C51 algorithm #266

Add C51 algorithm #266

shengxiang19 commented Dec 25, 2020 •

edited

Loading

codecov-io commented Dec 25, 2020 •

edited

Loading

Trinkle23897 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 left a comment

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

shengxiang19 commented Dec 26, 2020

shengxiang19 commented Dec 27, 2020

Trinkle23897 left a comment

Trinkle23897 left a comment •

edited

Loading

duburcqa Jan 6, 2021

Trinkle23897 Jan 6, 2021

duburcqa Jan 6, 2021

duburcqa Jan 6, 2021

duburcqa Jan 6, 2021

Add C51 algorithm #266

Add C51 algorithm #266

Conversation

shengxiang19 commented Dec 25, 2020 • edited Loading

codecov-io commented Dec 25, 2020 • edited Loading

Codecov Report

Trinkle23897 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 left a comment

Choose a reason for hiding this comment

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

Trinkle23897 commented Dec 25, 2020

shengxiang19 commented Dec 25, 2020

shengxiang19 commented Dec 26, 2020

shengxiang19 commented Dec 27, 2020

Trinkle23897 left a comment

Choose a reason for hiding this comment

Trinkle23897 left a comment • edited Loading

Choose a reason for hiding this comment

duburcqa Jan 6, 2021

Choose a reason for hiding this comment

Trinkle23897 Jan 6, 2021

Choose a reason for hiding this comment

duburcqa Jan 6, 2021

Choose a reason for hiding this comment

duburcqa Jan 6, 2021

Choose a reason for hiding this comment

duburcqa Jan 6, 2021

Choose a reason for hiding this comment

shengxiang19 commented Dec 25, 2020 •

edited

Loading

codecov-io commented Dec 25, 2020 •

edited

Loading

Trinkle23897 left a comment •

edited

Loading