-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency between time_since_last_alarm in observations and env._attention_budget._all_successful_alarms used in AlarmReward #226
Comments
Hello, Please next time try to remove any non relevant piece of code that are not useful for the bug you are trying to emphasize. Tipycally, here, simply put an agent that raises some alarm, and show in the observation that Not need for lightsim, no need for runner, no need for episodedata, no need for an agent, so basically, to reproduce the issue (and to debug it, which i did not do yet). I get that these part are important for the test performed, but not when explaining an issue. For example, is this important that this problem arises at the last observation ? |
I am closing this issue: when the environment is in a "done" state, gym does not enforce anything on the quality of the observation, that should not be used (in this case, as you might see, Look at the observation corresponding to the last successful actions. And you might see that |
Unfortunately it seems you were not able to reproduce the current output presented. episode_data.observations[nb_timestep - 2].time_since_last_alarm is indeed looked at in time_since_last_alarm array. In the end, the issue is not necessarily a bad value in the observation. The current issue is an inconsistency of those values with the one in env._attention_budget._all_successful_alarms (that is used in the reward here https://github.com/rte-france/Grid2Op/blob/e19cb7590d3fcd697972aaabfb2a3bcff4b3f0e1/grid2op/Reward/AlarmReward.py#L149) I then understood why you could not reproduce: I pointed you on the wrong chronics, my bad. Those two chronics comes from Scenario_april_42. I will share it with you. |
Sure it can probably still be improved I agree. Note that it was yet reduced by half and simplified quite bit before posting while ensuring issue was still reproducible. Feel free to show the ideal script you would have liked to improve them next time. |
Simple script: See, no runner, no episode data, no pandas, no agent, nothing but the bug I have to inspect. And i can even inspect the env if i want :-) import grid2op
from grid2op.Parameters import Parameters
env_path='grid2op/data_test/l2rpn_neurips_2020_track1_with_alert'
######
param = Parameters()
# param.init_from_dict({'ALARM_BEST_TIME':6,'ALARM_WINDOW_SIZE':6})
from grid2op.Reward import AlarmReward
env = grid2op.make(env_path, param=param, reward_class=AlarmReward)
# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)
# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)
# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)
# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)
# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)
# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(obs.time_since_last_alarm) # should be 1 (last alarm not sucessfull) |
import grid2op
from grid2op.Parameters import Parameters
env_path='/YourPath/env_debug_time_last_alarm_inconsistency'
######
param = Parameters()
from grid2op.Reward import AlarmReward
#WARNING: Make sure in config.py to have attention_budget._max_budget and attention_budget._init_budget to infinity
env = grid2op.make(env_path, param=param, reward_class=AlarmReward)#,attention_max_budget=9999 )
print(env._attention_budget._init_budget)#should be infinity like 9999
print(env._attention_budget._max_budget)#should be infinity like 9999
env.seed(1660836287)#for id_chronic=0
# raise an alarm:
done=False
observations=[]
while not done:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
observations.append(obs)
nb_timestep=len(observations)
time_since_last_alarm = []
for obs in observations[0:nb_timestep]:
time_since_last_alarm.append(obs.time_since_last_alarm[0])
alarm_timesteps_obs = [t+1 for t in range(0,nb_timestep-1) if time_since_last_alarm[t]==0]#t+1 to be consistent with env timesteps
assert(len(env._attention_budget._all_successful_alarms)==len(alarm_timesteps_obs)) In this one, we see again that env._attention_budget._all_successful_alarms has 2 elements more than alarm_timesteps_obs. Same issue as before |
For me, there is still on bug: import grid2op
from grid2op.Parameters import Parameters
env_path='./env_debug_time_last_alarm_inconsistency'
######
param = Parameters()
from grid2op.Reward import AlarmReward
#WARNING: Make sure in config.py to have attention_budget._max_budget and attention_budget._init_budget to infinity
env = grid2op.make(env_path, param=param, reward_class=AlarmReward,
kwargs_attention_budget={"init_budget": 999, "max_budget": 9999,
"budget_per_ts": 1. / (12.*8),
"alarm_cost": 1.}
)#,attention_max_budget=9999 )
print(env._attention_budget._init_budget)#should be infinity like 9999
print(env._attention_budget._max_budget)#should be infinity like 9999
env.seed(1660836287) # for id_chronic=0
# raise an alarm:
done = False
observations = []
ts = 0
while not done:
ts += 1
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
observations.append(obs)
if not done:
assert obs.time_since_last_alarm[0] == 0
else:
# last observation will never have the flag 'time_since_last_alarm'
assert obs.time_since_last_alarm[0] == -1
# the environment register all alarms
assert len(env._attention_budget._all_successful_alarms) == ts This script run till the end and at each step, it checks that everything works fine. If something does not work, i suspect it's in the last few lines of your code that do not rely on grid2op. |
If you add that in your while: while not done:
ts += 1
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
observations.append(obs)
if(ts>=372):
print('observations[371]:'+str(observations[371].time_since_last_alarm[0])+' at timestep '+str(ts)) you get: The value has changed for the last true observation before game-over, which might cause why we are missing the alarm at this timestep in the end |
Fixed and merged in version 1.6.0, now available on pypi |
Environment
1.6.0.rc1
osx
Bug description
time_since_last_alarm in observations and env._attention_budget._all_successful_alarms used in AlarmReward look inconsitent. Some alarms we see in _all_successful_alarms for the last time steps before the failure, we don't see in the last observation[nb_timesteps-1]. hence we are given a reward for an alarm we don't find back in the observation.
How to reproduce
Command line
# command line used if any
Code snippet
Current output
Expected output
The text was updated successfully, but these errors were encountered: