Possible bugs : Determine action with previous ( not current ) state #5

eupktcha · 2018-08-21T03:31:33Z

Hi,

I feel like something is wrong with gw.step() call at
(https://github.com/stormmax/irl-imitation/blob/master/maxent_irl_gridworld.py#L95)
and
(https://github.com/stormmax/irl-imitation/blob/master/deep_maxent_irl_gridworld.py#L72) .

I think
cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(cur_state)]))
should be
cur_state, action, next_state, reward, is_done = gw.step(int(policy[gw.pos2idx(next_state)])).
By calling step() , current state inside gridworld object is iterated. So local variable here
next_state (not cur_state confusingly) always corresponds to the current state, and
that should be passed to the policy.

Do I misunderstand something?

The text was updated successfully, but these errors were encountered:

dahehe98 · 2023-12-23T09:35:48Z

I totally agree with you.

eupktcha closed this as completed Aug 21, 2018

eupktcha changed the title ~~Possible bugs : calculate policy with previous state~~ Possible bugs : Determine action with previous ( not current ) state Aug 21, 2018

eupktcha reopened this Aug 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bugs : Determine action with previous ( not current ) state #5

Possible bugs : Determine action with previous ( not current ) state #5

eupktcha commented Aug 21, 2018 •

edited

Loading

dahehe98 commented Dec 23, 2023

Possible bugs : Determine action with previous ( not current ) state #5

Possible bugs : Determine action with previous ( not current ) state #5

Comments

eupktcha commented Aug 21, 2018 • edited Loading

dahehe98 commented Dec 23, 2023

eupktcha commented Aug 21, 2018 •

edited

Loading