You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(cur_state)]))
should be cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(next_state)])).
By calling step() , current state inside gridworld object is iterated. So local variable here next_state (not cur_state confusingly) always corresponds to the current state, and
that should be passed to the policy.
Do I misunderstand something?
The text was updated successfully, but these errors were encountered:
eupktcha
changed the title
Possible bugs : calculate policy with previous state
Possible bugs : Determine action with previous ( not current ) state
Aug 21, 2018
Hi,
I feel like something is wrong with gw.step() call at
(https://github.com/stormmax/irl-imitation/blob/master/maxent_irl_gridworld.py#L95)
and
(https://github.com/stormmax/irl-imitation/blob/master/deep_maxent_irl_gridworld.py#L72) .
I think
cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(cur_state)]))
should be
cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(next_state)]))
.By calling step() , current state inside gridworld object is iterated. So local variable here
next_state (not cur_state confusingly) always corresponds to the current state, and
that should be passed to the policy.
Do I misunderstand something?
The text was updated successfully, but these errors were encountered: