Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Meaning of the "al" variable? #13

Closed
araffin opened this issue May 3, 2019 · 4 comments
Closed

[Question] Meaning of the "al" variable? #13

araffin opened this issue May 3, 2019 · 4 comments

Comments

@araffin
Copy link

araffin commented May 3, 2019

Hello,

looking at the code in that repo, I came across this variable:

al = np.arange(l).reshape(-1, 1) / 1000.0
feat = np.concatenate([o, al, al**2, al**3, np.ones((l, 1))], axis=1)

the last time I saw something similar, that was in the OpenAI baselines code (here the stable baselines fork):

https://github.com/hill-a/stable-baselines/blob/333c59379f23e1f5c5c9e8bf93cbfa56ac52d13b/stable_baselines/acktr/value_functions.py#L58-L64

I assume this encodes some information about the time (cf plot below of the different features), but I'm still wondering what does al stands for and where does it come from? (there is no mention of such feature in both papers)

al

@araffin araffin changed the title Meaning of the "al" variable [Question] Meaning of the "al" variable May 3, 2019
@araffin araffin changed the title [Question] Meaning of the "al" variable [Question] Meaning of the "al" variable? May 3, 2019
@sashank-tirumala
Copy link

This code is similar to the linear_feature_baseline code in rllab:
https://github.com/rll/rllab/blob/master/rllab/baselines/linear_feature_baseline.py

They released a paper with that code but they didn't explain their feature selection. In my opinion, they seem to be using a polynomial basis as explained in Sutton. However, they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly. I think we can get much better features if we design them for a specific problem. (See Sutton and Barto)

@araffin
Copy link
Author

araffin commented May 17, 2019

Thanks for your reply =).

they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly.

ok, that makes sense (I was only thinking in term of intermediate reward, not return), especially for the fixed length environments like HalfCheetah. But I don't think this should be limited to continuous action tasks. Also, I still have some trouble understanding how larger power of "al" (al**2, al**3) can help more.

So, I think we need to ask @dementrock (rllab) and @joschu (openai baselines) to have a final answer of what the "al" variable mean.

(pinging @hill-a because I think he is also interested in the answer)

@sashank-tirumala
Copy link

sashank-tirumala commented May 17, 2019 via email

@araffin
Copy link
Author

araffin commented May 30, 2019

I contacted @dementrock directly and got the final response:

Thank you for your interest. Unfortunately I do not recall the original motivation, except that it might simply be lazy naming and picking “a”range of “l” as the variable name.

You are right that it’s encoding information about time, which is important in finite-horizon problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants