Skip to content

Commit

Permalink
deploy: 546f153
Browse files Browse the repository at this point in the history
  • Loading branch information
hesic73 committed Mar 2, 2024
1 parent bb20fe8 commit f667481
Show file tree
Hide file tree
Showing 66 changed files with 547 additions and 406 deletions.
Binary file modified .doctrees/environment.pickle
Binary file not shown.
Binary file modified .doctrees/gomoku_rl.collector.doctree
Binary file not shown.
Binary file modified .doctrees/gomoku_rl.policy.base.doctree
Binary file not shown.
Binary file modified .doctrees/gomoku_rl.policy.common.doctree
Binary file not shown.
Binary file modified .doctrees/gomoku_rl.policy.doctree
Binary file not shown.
Binary file modified .doctrees/gomoku_rl.policy.dqn.doctree
Binary file not shown.
Binary file modified .doctrees/gomoku_rl.policy.ppo.doctree
Binary file not shown.
Binary file modified .doctrees/index.doctree
Binary file not shown.
32 changes: 27 additions & 5 deletions _modules/gomoku_rl/collector.html
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">gomoku_rl</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../gomoku_rl.html">gomoku_rl package</a></li>
</ul>

</div>
Expand Down Expand Up @@ -360,6 +360,11 @@ <h1>Source code for gomoku_rl.collector</h1><div class="highlight"><pre>
<span class="bp">self</span><span class="o">.</span><span class="n">_t</span><span class="p">,</span>
<span class="p">)</span> <span class="o">=</span> <span class="n">self_play_step</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_env</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_policy</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_t_minus_1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_t</span><span class="p">)</span>

<span class="c1"># truncate the last transition</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">steps</span><span class="o">-</span><span class="mi">2</span><span class="p">:</span>
<span class="n">transition</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span>
<span class="n">transition</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">bool</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">transition</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>

<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_augment</span><span class="p">:</span>
<span class="n">transition</span> <span class="o">=</span> <span class="n">augment_transition</span><span class="p">(</span><span class="n">transition</span><span class="p">)</span>

Expand Down Expand Up @@ -423,10 +428,10 @@ <h1>Source code for gomoku_rl.collector</h1><div class="highlight"><pre>
<span class="sd"> steps (int): The number of steps to execute in the environment for this rollout. It is adjusted to be an even number to ensure an equal number of actions for both players.</span>

<span class="sd"> Returns:</span>
<span class="sd"> tuple: A tuple containing three elements:</span>
<span class="sd"> - A TensorDict of transitions collected for the black player, with each transition representing a game state before the black player&#39;s action, the action taken, and the resulting state.</span>
<span class="sd"> - A TensorDict of transitions collected for the white player, structured similarly to the black player&#39;s transitions. Note that for the first step, the white player does not take an action, so their collection starts from the second step.</span>
<span class="sd"> - A dictionary containing additional information about the rollout.</span>
<span class="sd"> tuple: A tuple containing three elements:</span>
<span class="sd"> - A TensorDict of transitions collected for the black player, with each transition representing a game state before the black player&#39;s action, the action taken, and the resulting state.</span>
<span class="sd"> - A TensorDict of transitions collected for the white player, structured similarly to the black player&#39;s transitions. Note that for the first step, the white player does not take an action, so their collection starts from the second step.</span>
<span class="sd"> - A dictionary containing additional information about the rollout.</span>

<span class="sd"> &quot;&quot;&quot;</span>

Expand Down Expand Up @@ -476,6 +481,13 @@ <h1>Source code for gomoku_rl.collector</h1><div class="highlight"><pre>
<span class="bp">self</span><span class="o">.</span><span class="n">_t</span><span class="p">,</span>
<span class="p">)</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_env</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_policy_black</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_policy_white</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_t_minus_1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_t</span><span class="p">)</span>

<span class="c1"># truncate the last transition</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">steps</span><span class="o">//</span><span class="mi">2</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="n">transition_black</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span>
<span class="n">transition_black</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">bool</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">transition_black</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
<span class="n">transition_white</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span>
<span class="n">transition_white</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">bool</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">transition_white</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>

<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_augment</span><span class="p">:</span>
<span class="n">transition_black</span> <span class="o">=</span> <span class="n">augment_transition</span><span class="p">(</span><span class="n">transition_black</span><span class="p">)</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
Expand Down Expand Up @@ -593,6 +605,11 @@ <h1>Source code for gomoku_rl.collector</h1><div class="highlight"><pre>
<span class="bp">self</span><span class="o">.</span><span class="n">_t</span><span class="p">,</span>
<span class="p">)</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_env</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_policy_black</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_policy_white</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_t_minus_1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_t</span><span class="p">,</span> <span class="n">return_black_transitions</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">return_white_transitions</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>

<span class="c1"># truncate the last transition</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">steps</span><span class="o">//</span><span class="mi">2</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="n">transition_black</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span>
<span class="n">transition_black</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">bool</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">transition_black</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>

<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_augment</span><span class="p">:</span>
<span class="n">transition_black</span> <span class="o">=</span> <span class="n">augment_transition</span><span class="p">(</span><span class="n">transition_black</span><span class="p">)</span>

Expand Down Expand Up @@ -706,6 +723,11 @@ <h1>Source code for gomoku_rl.collector</h1><div class="highlight"><pre>
<span class="bp">self</span><span class="o">.</span><span class="n">_t</span><span class="p">,</span>
<span class="p">)</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_env</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_policy_black</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_policy_white</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_t_minus_1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_t</span><span class="p">,</span> <span class="n">return_black_transitions</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">return_white_transitions</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>

<span class="c1"># truncate the last transition</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">steps</span><span class="o">//</span><span class="mi">2</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="n">transition_white</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span>
<span class="n">transition_white</span><span class="p">[</span><span class="s2">&quot;next&quot;</span><span class="p">,</span> <span class="s2">&quot;done&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">bool</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">transition_white</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>

<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_augment</span><span class="p">:</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="mi">0</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">transition_white</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">transition_white</span> <span class="o">=</span> <span class="n">augment_transition</span><span class="p">(</span><span class="n">transition_white</span><span class="p">)</span>
Expand Down
2 changes: 1 addition & 1 deletion _modules/gomoku_rl/core.html
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">gomoku_rl</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../gomoku_rl.html">gomoku_rl package</a></li>
</ul>

</div>
Expand Down
2 changes: 1 addition & 1 deletion _modules/gomoku_rl/env.html
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">gomoku_rl</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../gomoku_rl.html">gomoku_rl package</a></li>
</ul>

</div>
Expand Down
Loading

0 comments on commit f667481

Please sign in to comment.