Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare Atari results with dopamine and OpenAI Baselines #616

Merged
merged 4 commits into from
Apr 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,6 @@ IPendulum
Reacher
Runtime
Nvidia
Enduro
Qbert
Seaquest
38 changes: 37 additions & 1 deletion docs/tutorials/benchmark.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,9 @@ TRPO 16 7min 62.9 26.5 10.1 0.6
Atari Benchmark
---------------

Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari
Tianshou also provides reliable and reproducible Atari 10M benchmark.

Every experiment is conducted under 10 random seeds for 10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari for source code and refer to https://wandb.ai/tianshou/atari.benchmark/reports/Atari-Benchmark--VmlldzoxOTA1NzA5 for detailed results hosted on wandb.

.. raw:: html

Expand All @@ -105,3 +107,37 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari
<br>
</center>


The table below compares the performance of Tianshou against published results on Atari games. We use max average return in 10M timesteps as the reward metric **(to be consistent with Mujoco)**. ``/`` means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `Google Dopamine <https://github.com/google/dopamine/tree/master/baselines/atari>`_ and `OpenAI Baselines <https://github.com/openai/baselines>`_.

+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
|Task |Pong |Breakout |Enduro |Qbert |MsPacman |Seaquest |SpaceInvaders |
+=======+================+==============+================+==================+====================+==============+===================+==================+
|DQN |Tianshou |**20.2 ± 2.3**|**133.5 ± 44.6**|997.9 ± 180.6 |**11620.2 ± 786.1** |2324.8 ± 359.8|**3213.9 ± 381.6** |947.9 ± 155.3 |
+ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
| |Dopamine |9.8 |92.2 |**2126.9** |6836.7 |**2451.3** |1406.6 |**1559.1** |
+ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
| |OpenAI Baselines|16.5 |131.5 |479.8 |3254.8 |/ |1164.1 |1129.5 ± 145.3 |
+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
|C51 |Tianshou |**20.6 ± 2.4**|**412.9 ± 35.8**|**940.8 ± 133.9** |**12513.2 ± 1274.6**|2254.9 ± 201.2|**3305.4 ± 1524.3**|557.3 |
+ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
| |Dopamine |17.4 |222.4 |665.3 |9924.5 |**2860.4** |1706.6 |**604.6 ± 157.5** |
+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
|Rainbow|Tianshou |**20.2 ± 3.0**|**440.4 ± 50.1**|1496.1 ± 112.3 |14224.8 ± 1230.1 |2524.2 ± 338.8|1934.6 ± 376.4 |**1178.4** |
+ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
| |Dopamine |19.1 |47.9 |**2185.1** |**15682.2** |**3161.7** |**3328.9** |459.9 |
+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
|IQN |Tianshou |**20.7 ± 2.9**|**355.9 ± 22.7**|**1252.7 ± 118.1**|**14409.2 ± 808.6** |2228.6 ± 253.1|5341.2 ± 670.2 |667.8 ± 81.5 |
+ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
| |Dopamine |19.6 |96.3 |1227.6 |12496.7 |**4422.7** |**16418** |**1358.2 ± 267.6**|
+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
|PPO |Tianshou |**20.3 ± 1.2**|**283.0 ± 74.3**|**1098.9 ± 110.5**|**12341.8 ± 1760.7**|1699.4 ± 248.0|1035.2 ± 353.6 |1641.3 |
+ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
| |OpenAI Baselines|13.7 |114.3 |350.2 |7012.1 |/ |**1218.9** |**1787.5 ± 340.8**|
+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
|QR-DQN |Tianshou |20.7 ± 2.0 |228.3 ± 27.3 |951.7 ± 333.5 |14761.5 ± 862.9 |2259.3 ± 269.2|4187.6 ± 725.7 |1114.7 ± 116.9 |
+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
|FQF |Tianshou |20.4 ± 2.5 |382.6 ± 29.5 |1816.8 ± 314.3 |15301.2 ± 684.1 |2506.6 ± 402.5|8051.5 ± 3155.6 |2558.3 |
+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+

Please note that the comparison table for both two benchmarks could NOT be used to prove which implementation is "better". The hyperparameters of the algorithms vary across different implementations. Also, the reward metric is not strictly the same (e.g. Tianshou uses max average return in 10M steps but OpenAI Baselines only report average return at 10M steps, which is unfair). Lastly, Tianshou always uses 10 random seeds while others might use fewer. The comparison is here only to show Tianshou's reliability.