[Question] Meaning of the "al" variable? #13

araffin · 2019-05-03T19:21:09Z

Hello,

looking at the code in that repo, I came across this variable:

Lines 16 to 17 in 4a7c219

    
           al = np.arange(l).reshape(-1, 1) / 1000.0 
        
           feat = np.concatenate([o, al, al**2, al**3, np.ones((l, 1))], axis=1)

the last time I saw something similar, that was in the OpenAI baselines code (here the stable baselines fork):

https://github.com/hill-a/stable-baselines/blob/333c59379f23e1f5c5c9e8bf93cbfa56ac52d13b/stable_baselines/acktr/value_functions.py#L58-L64

I assume this encodes some information about the time (cf plot below of the different features), but I'm still wondering what does al stands for and where does it come from? (there is no mention of such feature in both papers)

The text was updated successfully, but these errors were encountered:

sashank-tirumala · 2019-05-17T06:18:40Z

This code is similar to the linear_feature_baseline code in rllab:
https://github.com/rll/rllab/blob/master/rllab/baselines/linear_feature_baseline.py

They released a paper with that code but they didn't explain their feature selection. In my opinion, they seem to be using a polynomial basis as explained in Sutton. However, they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly. I think we can get much better features if we design them for a specific problem. (See Sutton and Barto)

araffin · 2019-05-17T12:56:13Z

Thanks for your reply =).

they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly.

ok, that makes sense (I was only thinking in term of intermediate reward, not return), especially for the fixed length environments like HalfCheetah. But I don't think this should be limited to continuous action tasks. Also, I still have some trouble understanding how larger power of "al" (al**2, al**3) can help more.

So, I think we need to ask @dementrock (rllab) and @joschu (openai baselines) to have a final answer of what the "al" variable mean.

(pinging @hill-a because I think he is also interested in the answer)

sashank-tirumala · 2019-05-17T13:52:55Z

Yeah true, The issue isn't closed yet. We should ask them.

…

On Fri, 17 May 2019, 18:26 Antonin RAFFIN, ***@***.***> wrote: Thanks for your reply =). they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly. ok, that makes sense (I was only thinking in term of intermediate reward, not return), especially for the fixed length environments like HalfCheetah. But I don't think this should be limited to continuous action tasks. Also, I still have some trouble understanding how larger power of "al" (al**2, al**3) can help more. So, I think we need to ask @dementrock <https://github.com/dementrock> (rllab) and @joschu <https://github.com/joschu> (openai baselines) to have a final answer of what the "al" variable mean. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#13?email_source=notifications&email_token=AGF3MRI73I4TRVX6SO6R5ILPV2TO3A5CNFSM4HKWMDQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUVSEI#issuecomment-493443345>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGF3MRLGEGCTKWXFBKBDTB3PV2TO3ANCNFSM4HKWMDQQ> .

araffin · 2019-05-30T20:41:19Z

I contacted @dementrock directly and got the final response:

Thank you for your interest. Unfortunately I do not recall the original motivation, except that it might simply be lazy naming and picking “a”range of “l” as the variable name.

You are right that it’s encoding information about time, which is important in finite-horizon problems.

araffin changed the title ~~Meaning of the "al" variable~~ [Question] Meaning of the "al" variable May 3, 2019

araffin changed the title ~~[Question] Meaning of the "al" variable~~ [Question] Meaning of the "al" variable? May 3, 2019

araffin closed this as completed May 30, 2019

araffin mentioned this issue May 18, 2020

Filtering out artificial teminal states hill-a/stable-baselines#863

Closed

araffin mentioned this issue Jan 7, 2021

[Bug] Infinite horizon tasks are handled like episodic tasks DLR-RM/stable-baselines3#284

Closed

araffin mentioned this issue May 3, 2022

New Step API with terminated, truncated bools instead of done openai/gym#2752

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Meaning of the "al" variable? #13

[Question] Meaning of the "al" variable? #13

araffin commented May 3, 2019

sashank-tirumala commented May 17, 2019

araffin commented May 17, 2019 •

edited

Loading

sashank-tirumala commented May 17, 2019 via email

araffin commented May 30, 2019

[Question] Meaning of the "al" variable? #13

[Question] Meaning of the "al" variable? #13

Comments

araffin commented May 3, 2019

sashank-tirumala commented May 17, 2019

araffin commented May 17, 2019 • edited Loading

sashank-tirumala commented May 17, 2019 via email

araffin commented May 30, 2019

araffin commented May 17, 2019 •

edited

Loading