You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is related with the way the agent interacts with the environment: at the beginning of training the environment is reset via self.env.new_random_game() and afterwards the history is filled with the new random state via self.history.add(screen), which is needed because the agent always chooses its actions taking that history as input via action = self.predict(self.history.get()).
When a terminal state is reached a new random game is created but the new random state is not added to the history this time. This causes that the agent will use the terminal state of the last episode to decide which action to take in the first state of the new episode, which I think is wrong.
A way to fix it would be to add
for _ in range(self.history_length):
self.history.add(screen)
I don't know if fixing this would have any positive impact on performance since it only affects the first self.history_length steps of each episode but anyways I wanted to share it.
Thanks in advance.
The text was updated successfully, but these errors were encountered:
if terminal:
screen, reward, action, terminal = self.env.new_random_game()
for _ in range(self.history_length):
self.history.add(screen)
we would mimic what happens when the agent is first initialized. We just fill its history with the first observation. I'm not sure if that's theoretically the way to go, though. I'll take a look at the paper and see if they mention anything about the history in the first state.
Hi. I am trying to understand the code and I came across what I think is a bug in:
DQN-tensorflow/dqn/agent.py
Line 32 in c7b1f10
It is related with the way the agent interacts with the environment: at the beginning of training the environment is reset via
self.env.new_random_game()
and afterwards the history is filled with the new random state viaself.history.add(screen)
, which is needed because the agent always chooses its actions taking that history as input viaaction = self.predict(self.history.get())
.When a terminal state is reached a new random game is created but the new random state is not added to the history this time. This causes that the agent will use the terminal state of the last episode to decide which action to take in the first state of the new episode, which I think is wrong.
A way to fix it would be to add
after this line.
I don't know if fixing this would have any positive impact on performance since it only affects the first
self.history_length
steps of each episode but anyways I wanted to share it.Thanks in advance.
The text was updated successfully, but these errors were encountered: