You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
Thank you for your tutorials on medium and example codes.
I am a new learner on reinforcement learning.
I tried to run the code of DRL_15_16_17_DQN_Pong but I failed to make it converge when training it.
I am trying to find the reason and I am wondering if the problem results from the reward mechanism of the game environment. When the game is on-going, for most of the time the reward is zero. Only when an episode ends does the game return a 1 or -1 reward. Therefore, for most of the time, the loss becomes the MSE between "current predicted Q value" and "discounted Q value under next state" without reward value involved. I thus deduce that all predicted Q values will eventually be equal after many training iterations, which therefore results in failure to converge.
Am I correct?
I appreciate it if you could offer help!
The text was updated successfully, but these errors were encountered:
Hi
Thank you for your tutorials on medium and example codes.
I am a new learner on reinforcement learning.
I tried to run the code of DRL_15_16_17_DQN_Pong but I failed to make it converge when training it.
I am trying to find the reason and I am wondering if the problem results from the reward mechanism of the game environment. When the game is on-going, for most of the time the reward is zero. Only when an episode ends does the game return a 1 or -1 reward. Therefore, for most of the time, the loss becomes the MSE between "current predicted Q value" and "discounted Q value under next state" without reward value involved. I thus deduce that all predicted Q values will eventually be equal after many training iterations, which therefore results in failure to converge.
Am I correct?
I appreciate it if you could offer help!
The text was updated successfully, but these errors were encountered: