Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubt on advantage calculation to update the policy on AWAC. #160

Open
Roberto09 opened this issue Jan 13, 2022 · 0 comments
Open

Doubt on advantage calculation to update the policy on AWAC. #160

Roberto09 opened this issue Jan 13, 2022 · 0 comments

Comments

@Roberto09
Copy link

Roberto09 commented Jan 13, 2022

Hey, this code and the AWAC paper are awesome! thanks for sharing this library; I've been reading some of it lately trying to understand and apply the AWAC paper:)

However, I had a doubt about on how the Q(s,a) term of the advantage function is implemented in the library:

q_adv = torch.min(q1_pred, q2_pred)

Where q1_pred and q2_pred are both directly calculated using the learned Q1 and Q2 functions.

I was wondering about this since, as I understand, the code is using Q(s,a) directly instead of the 1-step returns: r(s,a) + Q(s', a') to compute the Q(s,a) term in the advantage function. It seems to me that there's already have the 1-step returns estimate computed under the variable q_target:

q_target = self.reward_scale * rewards + (1. - terminals) * self.discount * target_q_values

Is there any reason for the use of Q(s,a) directly obtained from doing min(Q1(s,a), Q2(s,a)) functions instead of the 1-step returns as r(s,a) + min(Q1(s',a'), Q2(s', a')); couldn't that introduce more bias to the estimate of the returns?

Oh, also, I did some tests switching it to the latter implementation on my implementation on a different problem and the results were very similar so it's unclear to me if there's actually any benefit from switching from one implementation to the other.

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant