update() retains discounted_reward from previous episodes #8

BigBadBurrow · 2019-09-20T11:10:57Z

In the update() method the discounted_reward is always calculated using a gamma of the previous discounted_reward, but there's no break between episodes so the reward from one episode is carried across to the next, which I assume cannot be correct.

Suggest adding terminal_states list to the Memory class, and then setting the discounted_reward = 0 when a new episode starts.

The text was updated successfully, but these errors were encountered:

mentioned in issue #8

BigBadBurrow · 2019-09-20T12:31:19Z

Remember, the array is reversed so I think you'd need to set the discounted_reward = 0 first:

            if is_terminal:
                discounted_reward = 0
            discounted_reward = reward + (self.gamma * discounted_reward)
            rewards.insert(0, discounted_reward)

#8

nikhilbarhate99 · 2019-09-20T12:51:47Z

Thanks!

nikhilbarhate99 added a commit that referenced this issue Sep 20, 2019

fixed reward leaking bug

fd5381d

mentioned in issue #8

nikhilbarhate99 added a commit that referenced this issue Sep 20, 2019

fixed reward leaking bug

d02da8d

mentioned in issue #8

nikhilbarhate99 closed this as completed in 6c9a2ef Sep 20, 2019

nikhilbarhate99 added a commit that referenced this issue Sep 20, 2019

reward leak fix

ff296df

#8

nikhilbarhate99 reopened this Sep 20, 2019

nikhilbarhate99 closed this as completed Sep 20, 2019

nikhilbarhate99 mentioned this issue Dec 3, 2019

Learning from scratch without using pre-trained model #15

Closed

This was referenced Jul 8, 2020

loss.mean().backward() crash #31

Closed

in cuda train error expected dtype Double but got dtype Float #33

Closed

nikhilbarhate99 mentioned this issue Apr 20, 2021

Discounted Reward Calulcation (Generalized Advantage Estimation) #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update() retains discounted_reward from previous episodes #8

update() retains discounted_reward from previous episodes #8

BigBadBurrow commented Sep 20, 2019

BigBadBurrow commented Sep 20, 2019

nikhilbarhate99 commented Sep 20, 2019

update() retains discounted_reward from previous episodes #8

update() retains discounted_reward from previous episodes #8

Comments

BigBadBurrow commented Sep 20, 2019

BigBadBurrow commented Sep 20, 2019

nikhilbarhate99 commented Sep 20, 2019