chapter09 PG_baseline_cartpole #44

charles-bu · 2019-04-04T07:39:00Z

Hi,

thank for you the book that brought me into the world of reinforcement learning. Though there is quite a lot material available in the internet, it is still quite hard for a beginner to catch the whole ideas about the reinforcement learning. your book provides a systematic way for those beginners like me.

my question is on your cp09 Policy gradient baseline code(cartpole). I would like to have the honour to get your further advice.

in the PG code, after the agent interacts with the environment, I noticed that you just record the S, A, R, S', discard the output value of the PGN net for every time stamp. in that case, at the training stage, the below codes were needed for the loss function,

states_v = torch.FloatTensor(batch_states)
logits_v = net(states_v)

My question is whether I could use store the net(state_v) during the first time Agent interacts with the environment. then, those the logit_v stored could be extracted for the loss function computing instead of making another computing for the logit_v. in that case, we could waive one round of net forward computing.

The reason I raised that question is that the pytorch spends a lot of time for converting CPU tensor to GPU. therefore, I thought if I could avoid that step, for accelerating the whole computing.

however, I am just a beginner only with 2 months learning on your books. I did not have the confidence on that.

your advice is highly appreciated!

Best Regards,

Charles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chapter09 PG_baseline_cartpole #44

chapter09 PG_baseline_cartpole #44

charles-bu commented Apr 4, 2019

chapter09 PG_baseline_cartpole #44

chapter09 PG_baseline_cartpole #44

Comments

charles-bu commented Apr 4, 2019