-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Meaning of the "al" variable? #13
Comments
This code is similar to the linear_feature_baseline code in rllab: They released a paper with that code but they didn't explain their feature selection. In my opinion, they seem to be using a polynomial basis as explained in Sutton. However, they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly. I think we can get much better features if we design them for a specific problem. (See Sutton and Barto) |
Thanks for your reply =).
ok, that makes sense (I was only thinking in term of intermediate reward, not return), especially for the fixed length environments like HalfCheetah. But I don't think this should be limited to continuous action tasks. Also, I still have some trouble understanding how larger power of "al" ( So, I think we need to ask @dementrock (rllab) and @joschu (openai baselines) to have a final answer of what the "al" variable mean. (pinging @hill-a because I think he is also interested in the answer) |
Yeah true,
The issue isn't closed yet. We should ask them.
…On Fri, 17 May 2019, 18:26 Antonin RAFFIN, ***@***.***> wrote:
Thanks for your reply =).
they added an additional term which is with respect to the position of the
state in the trajectory (this is al). It makes intuitive sense, since in
continuous control tasks the same state may appear at the end or the start
of the trajectory (think cart pole states), and depending on that their
returns may vary wildly.
ok, that makes sense (I was only thinking in term of intermediate reward,
not return), especially for the fixed length environments like HalfCheetah.
But I don't think this should be limited to continuous action tasks. Also,
I still have some trouble understanding how larger power of "al" (al**2,
al**3) can help more.
So, I think we need to ask @dementrock <https://github.com/dementrock>
(rllab) and @joschu <https://github.com/joschu> (openai baselines) to
have a final answer of what the "al" variable mean.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13?email_source=notifications&email_token=AGF3MRI73I4TRVX6SO6R5ILPV2TO3A5CNFSM4HKWMDQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUVSEI#issuecomment-493443345>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGF3MRLGEGCTKWXFBKBDTB3PV2TO3ANCNFSM4HKWMDQQ>
.
|
I contacted @dementrock directly and got the final response:
|
Hello,
looking at the code in that repo, I came across this variable:
mjrl/mjrl/baselines/linear_baseline.py
Lines 16 to 17 in 4a7c219
the last time I saw something similar, that was in the OpenAI baselines code (here the stable baselines fork):
https://github.com/hill-a/stable-baselines/blob/333c59379f23e1f5c5c9e8bf93cbfa56ac52d13b/stable_baselines/acktr/value_functions.py#L58-L64
I assume this encodes some information about the time (cf plot below of the different features), but I'm still wondering what does
al
stands for and where does it come from? (there is no mention of such feature in both papers)The text was updated successfully, but these errors were encountered: