Help understanding how to read the code #30

ryanmaxwell96 · 2020-04-01T20:07:56Z

Hello,

Just a quick question. In policy.py in class Policy it uses the Keras package to call "get_layer". This is the output layer correct? Also, I sent an email out so feel free to ignore this part if you already answered it, but I see from the TRPO paper that the NN is supposed to only calculate the mean and somehow uses another set of parameters which are a vector of the same size as the number of actions. But the paper is not clear to me how stdev is actually computed or updated. And in this code, all of it is computed under the hood in Keras.

Anyhelp on this would be greatly appreciated!

Ryan

ryanmaxwell96 · 2020-04-04T23:09:42Z

Also, can you help me understand why there are multiple neural network outputs? I tested the halfcheetah code and found out that the observation is a (1,27) but for some reason a (1,6) vector of means is returned and I'm at a loss as to why there are 6 means being returned.

ryanmaxwell96 · 2020-04-04T23:22:44Z

Unless it refers to the 6 half-cheetah joints that can move depending on the state (of 27 dimensions). So depending on which of these states it is in, the policy will tell it what position each of these joints should be in via means and log vars.

pat-coady · 2020-04-05T11:41:21Z

Exactly, the policy network takes a state and returns an action. The dimensionality of state and action are different. I'd have to look for sure, but my guess is the cheetah only accepts 6 actuation inputs. But many more positions and velocities are measured on the cheetah.

…

On Apr 4, 2020, at 7:22 PM, ryanmaxwell96 ***@***.***> wrote: Unless it refers to the 6 half-cheetah joints that can move in which case for every state (of 27 dimensions). So depending on which of these states it is in, the policy will tell it what position each of these joints should be in via means and log vars. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFTULRDRADDKRFX4N3MP563RK66NDANCNFSM4LZQRNJQ>.

ryanmaxwell96 · 2020-04-12T00:38:22Z

Ok thank you. Sorry I have another question. Where is the name "policy_nn" coming from? I'm guessing it is the last layer, correct?

ryanmaxwell96 · 2020-04-14T04:00:45Z

Can you please explain to me why in value.py it has an output Dense layer of size 1? Shouldn't it be the same size as the action dimension?

ryanmaxwell96 · 2020-04-15T02:48:13Z

Also, how do you use plotting.py? I don't see it currently being used in any of the code.

ryanmaxwell96 closed this as completed Apr 12, 2020

ryanmaxwell96 reopened this Apr 12, 2020

ryanmaxwell96 closed this as completed Apr 12, 2020

ryanmaxwell96 reopened this Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help understanding how to read the code #30

Help understanding how to read the code #30

ryanmaxwell96 commented Apr 1, 2020

ryanmaxwell96 commented Apr 4, 2020

ryanmaxwell96 commented Apr 4, 2020 •

edited

Loading

pat-coady commented Apr 5, 2020 via email

ryanmaxwell96 commented Apr 12, 2020

ryanmaxwell96 commented Apr 14, 2020 •

edited

Loading

ryanmaxwell96 commented Apr 15, 2020

Help understanding how to read the code #30

Help understanding how to read the code #30

Comments

ryanmaxwell96 commented Apr 1, 2020

ryanmaxwell96 commented Apr 4, 2020

ryanmaxwell96 commented Apr 4, 2020 • edited Loading

pat-coady commented Apr 5, 2020 via email

ryanmaxwell96 commented Apr 12, 2020

ryanmaxwell96 commented Apr 14, 2020 • edited Loading

ryanmaxwell96 commented Apr 15, 2020

ryanmaxwell96 commented Apr 4, 2020 •

edited

Loading

ryanmaxwell96 commented Apr 14, 2020 •

edited

Loading