Chapter 10 - Advantage computed the wrong way? #74

ghost · 2020-05-14T21:49:08Z

https://ai.stackexchange.com/questions/21172/advantage-computed-the-wrong-way

ipdb> adv_v.shape                                                                                                                            
torch.Size([128, 128])

ipdb> vals_ref_v.shape                                                                                                                       
torch.Size([128])

ipdb> values_v.detach().shape                                                                                                                
torch.Size([128, 1])

The text was updated successfully, but these errors were encountered:

ghost · 2020-05-15T20:26:29Z

I changed the line adv_v = vals_ref_v - value_v.detach() to adv_v = vals_ref_v - value_v.squeeze(-1).detach(). It seems the convergence is much faster. According to the A2C algorithm, it is just logic to apply Q(a, s) - V(s) where Q(a, s) and V(s) with the same shape.

The call to detach() is important here as we don't want to propagate the PG into our value approximation head.

ghost · 2020-05-15T21:45:42Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 10 - Advantage computed the wrong way? #74

Chapter 10 - Advantage computed the wrong way? #74

ghost commented May 14, 2020

ghost commented May 15, 2020 •

edited by ghost

Loading

ghost commented May 15, 2020

Chapter 10 - Advantage computed the wrong way? #74

Chapter 10 - Advantage computed the wrong way? #74

Comments

ghost commented May 14, 2020

ghost commented May 15, 2020 • edited by ghost Loading

ghost commented May 15, 2020

ghost commented May 15, 2020 •

edited by ghost

Loading