Skip to content

Commit

Permalink
Fix typo (pytorch#891)
Browse files Browse the repository at this point in the history
Co-authored-by: holly1238 <77758406+holly1238@users.noreply.github.com>
  • Loading branch information
tom-doerr and holly1238 authored Apr 12, 2021
1 parent 7c68af8 commit 4bd1164
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion intermediate_source/reinforcement_q_learning.py
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,7 @@ def plot_durations():
# single step of the optimization. It first samples a batch, concatenates
# all the tensors into a single one, computes :math:`Q(s_t, a_t)` and
# :math:`V(s_{t+1}) = \max_a Q(s_{t+1}, a)`, and combines them into our
# loss. By defition we set :math:`V(s) = 0` if :math:`s` is a terminal
# loss. By definition we set :math:`V(s) = 0` if :math:`s` is a terminal
# state. We also use a target network to compute :math:`V(s_{t+1})` for
# added stability. The target network has its weights kept frozen most of
# the time, but is updated with the policy network's weights every so often.
Expand Down

0 comments on commit 4bd1164

Please sign in to comment.