You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank for you the book that brought me into the world of reinforcement learning. Though there is quite a lot material available in the internet, it is still quite hard for a beginner to catch the whole ideas about the reinforcement learning. your book provides a systematic way for those beginners like me.
my question is on your cp09 Policy gradient baseline code(cartpole). I would like to have the honour to get your further advice.
in the PG code, after the agent interacts with the environment, I noticed that you just record the S, A, R, S', discard the output value of the PGN net for every time stamp. in that case, at the training stage, the below codes were needed for the loss function,
My question is whether I could use store the net(state_v) during the first time Agent interacts with the environment. then, those the logit_v stored could be extracted for the loss function computing instead of making another computing for the logit_v. in that case, we could waive one round of net forward computing.
The reason I raised that question is that the pytorch spends a lot of time for converting CPU tensor to GPU. therefore, I thought if I could avoid that step, for accelerating the whole computing.
however, I am just a beginner only with 2 months learning on your books. I did not have the confidence on that.
your advice is highly appreciated!
Best Regards,
Charles
The text was updated successfully, but these errors were encountered:
Hi,
thank for you the book that brought me into the world of reinforcement learning. Though there is quite a lot material available in the internet, it is still quite hard for a beginner to catch the whole ideas about the reinforcement learning. your book provides a systematic way for those beginners like me.
my question is on your cp09 Policy gradient baseline code(cartpole). I would like to have the honour to get your further advice.
in the PG code, after the agent interacts with the environment, I noticed that you just record the S, A, R, S', discard the output value of the PGN net for every time stamp. in that case, at the training stage, the below codes were needed for the loss function,
states_v = torch.FloatTensor(batch_states)
logits_v = net(states_v)
My question is whether I could use store the net(state_v) during the first time Agent interacts with the environment. then, those the logit_v stored could be extracted for the loss function computing instead of making another computing for the logit_v. in that case, we could waive one round of net forward computing.
The reason I raised that question is that the pytorch spends a lot of time for converting CPU tensor to GPU. therefore, I thought if I could avoid that step, for accelerating the whole computing.
however, I am just a beginner only with 2 months learning on your books. I did not have the confidence on that.
your advice is highly appreciated!
Best Regards,
Charles
The text was updated successfully, but these errors were encountered: