You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not so sure this is a problem; it's similar to line 128 in the original code. We are already calculating the Q value for s, so perhaps the authors see it as too expensive to calculate it for both s and s' (all terms in the beta update concern s').
I'm going to keep this issue open because I might see what happens if the code is changed.
The option_term_prob gives the option termination probability for the current option and done indicates a transition from current state to the next state. In that case, we need an advantage over the next state.
The other way would be to replace the two with prev_option_term_prob and the previous dones, since they are already available to the agent at a given timestep.
The termination probability is calculated over the next state according to the original paper. So it should be using next_obs instead of obs.
option-critic-pytorch/option_critic.py
Line 238 in 0c57da7
The text was updated successfully, but these errors were encountered: