Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between time_since_last_alarm in observations and env._attention_budget._all_successful_alarms used in AlarmReward #226

Closed
marota opened this issue Jun 11, 2021 · 9 comments
Labels
bug Something isn't working cannot_reproduce This issue cannot be reproduced with the provided code snippet meaning they are given a low priority

Comments

@marota
Copy link
Contributor

marota commented Jun 11, 2021

Environment

  • Grid2op version: 1.6.0.rc1
  • System: osx

Bug description

time_since_last_alarm in observations and env._attention_budget._all_successful_alarms used in AlarmReward look inconsitent. Some alarms we see in _all_successful_alarms for the last time steps before the failure, we don't see in the last observation[nb_timesteps-1]. hence we are given a reward for an alarm we don't find back in the observation.

How to reproduce

Command line

# command line used if any 

Code snippet

import numpy as np

import grid2op
print(grid2op.__version__)

# Backend class to use
try:
    from lightsim2grid.LightSimBackend import LightSimBackend
    BACKEND = LightSimBackend
except ModuleNotFoundError:
    from grid2op.Backend import PandaPowerBackend
    BACKEND = PandaPowerBackend

from grid2op.Runner import Runner
from grid2op.Parameters import Parameters

from grid2op.Agent.BaseAgent import BaseAgent
import numpy as np


class DoNothing_Attention_Agent(BaseAgent):
    """
    This is the most basic BaseAgent. It is purely passive, and does absolutely nothing.
    As opposed to most reinforcement learning environments, in grid2op, doing nothing is often
    the best solution.
    """

    def __init__(self, action_space, alarms_lines_area):
        BaseAgent.__init__(self, action_space)
        self.alarms_lines_area = alarms_lines_area
        self.alarms_area_names = env.alarms_area_names

    def act(self, observation, reward, done=False):
        """
        As better explained in the document of :func:`grid2op.BaseAction.update` or
        :func:`grid2op.BaseAction.ActionSpace.__call__`.
        The preferred way to make an object of type action is to call :func:`grid2op.BaseAction.ActionSpace.__call__`
        with the dictionary representing the action. In this case, the action is "do nothing" and it is represented by
        the empty dictionary.
        Parameters
        ----------
        observation: :class:`grid2op.Observation.Observation`
            The current observation of the :class:`grid2op.Environment.Environment`
        reward: ``float``
            The current reward. This is the reward obtained by the previous action
        done: ``bool``
            Whether the episode has ended or not. Used to maintain gym compatibility
        Returns
        -------
        res: :class:`grid2op.Action.Action`
            The action chosen by the bot / controller / agent.
        """
        res = self.action_space({})
        if (np.max(observation.rho) >= 1):
            zones_alert = self.get_region_alert(observation)
            res = self.action_space({"raise_alarm": zones_alert})
        # print(res)
        return res

    def get_region_alert(self, observation):
        # extract the zones they belong too
        zones_these_lines = set()
        zone_for_each_lines = self.alarms_lines_area

        lines_overloaded = np.where(observation.rho >= 1)[0].tolist()  # obs.rho>0.6
        print(lines_overloaded)
        for line_id in lines_overloaded:
            line_name = observation.name_line[line_id]
            for zone_name in zone_for_each_lines[line_name]:
                zones_these_lines.add(zone_name)

        zones_these_lines = list(zones_these_lines)
        zones_ids_these_lines = [self.alarms_area_names.index(zone) for zone in zones_these_lines]
        return zones_ids_these_lines



####Input environment path here
env_path='YourPath/env_debug_time_last_alarm_inconsistency'
######


param = Parameters()
#param.init_from_dict({'ALARM_BEST_TIME':6,'ALARM_WINDOW_SIZE':6})

from grid2op.Reward import AlarmReward
env = grid2op.make(env_path,backend=BACKEND()
                   ,param=param,reward_class=AlarmReward)



agent=DoNothing_Attention_Agent(env.action_space,env.alarms_lines_area)

runner = Runner(**env.get_params_for_runner(),
                agentClass=None,
                agentInstance=agent
                )


id_chonic=0 #0 or 1
env_seeds=[1660836287,1572415299]
agent_seed=[172658395,1708891582]
res_episode = runner.run_one_episode(detailed_output=True,indx=id_chonic,
                env_seed=env_seeds[id_chonic],path_save='res_alert',
                agent_seed=agent_seed[id_chonic])

name_chron, cum_reward, nb_timestep, episode_data = res_episode

print('time_since_last_alarm is: '+str(episode_data.observations[nb_timestep-1].time_since_last_alarm[0]))
print('reward is: '+str(cum_reward))
#look at what is in the observation
time_since_last_alarm = []
for obs in episode_data.observations[1:nb_timestep]:
    time_since_last_alarm.append(obs.time_since_last_alarm[0])
alarm_timesteps_obs = [t+1 for t in range(0,nb_timestep-1) if time_since_last_alarm[t]==0]#t+1 to be consistent with env timesteps


print('survival time is: '+str(nb_timestep))
print('alarm timesteps in observations are:'+str(alarm_timesteps_obs))

Current output

####WHAT YOU WILL SEE
#for id_chonic=0
#successfull_alarms = env._attention_budget._all_successful_alarms
#[(223, array([ True, False, False])),
# (224, array([ True, False, False])),
# (372, array([ True,  True, False])),
# (373, array([ True,  True, False]))]
#but alarm timseteps in observations are:[223, 224]

#forid_chonic=1
#successfull_alarms = env._attention_budget._all_successful_alarms
#[(259, array([False, False,  True])),
 #(260, array([False, False,  True])),
 #(261, array([False, False,  True])),
 #(262, array([False, False,  True]))]
 #but alarm timesteps in observations are:[259, 260]

Expected output

We should see consitency between time_since_last_alarm in observation and env._attention_budget._all_successful_alarms 

So probably (id_chonic=0) either alarm_timesteps_obs=env._attention_budget._all_successful_alarms=  [223, 224]

or alarm_timesteps_obs=env._attention_budget._all_successful_alarms= [223, 224,372,373]
@marota marota added the bug Something isn't working label Jun 11, 2021
@BDonnot
Copy link
Collaborator

BDonnot commented Jun 14, 2021

Hello,

Please next time try to remove any non relevant piece of code that are not useful for the bug you are trying to emphasize.

Tipycally, here, simply put an agent that raises some alarm, and show in the observation that time_since_last_alarm is not updated when the alarm are illegal or something.

Not need for lightsim, no need for runner, no need for episodedata, no need for an agent, so basically, to reproduce the issue (and to debug it, which i did not do yet).

I get that these part are important for the test performed, but not when explaining an issue. For example, is this important that this problem arises at the last observation ?
If the problem is between time_since_last_alarm then why using a runner ? etc. etc.

@BDonnot
Copy link
Collaborator

BDonnot commented Jun 14, 2021

I am closing this issue: when the environment is in a "done" state, gym does not enforce anything on the quality of the observation, that should not be used (in this case, as you might see, obs.a_or is all 0, so is everything).

Look at the observation corresponding to the last successful actions. And you might see that episode_data.observations[nb_timestep - 2].time_since_last_alarm is indeed 0 (an alarm has been raised at this step)

@BDonnot BDonnot closed this as completed Jun 14, 2021
@marota
Copy link
Contributor Author

marota commented Jun 15, 2021

Unfortunately it seems you were not able to reproduce the current output presented. episode_data.observations[nb_timestep - 2].time_since_last_alarm is indeed looked at in time_since_last_alarm array. In the end, the issue is not necessarily a bad value in the observation. The current issue is an inconsistency of those values with the one in env._attention_budget._all_successful_alarms (that is used in the reward here https://github.com/rte-france/Grid2Op/blob/e19cb7590d3fcd697972aaabfb2a3bcff4b3f0e1/grid2op/Reward/AlarmReward.py#L149)

I then understood why you could not reproduce: I pointed you on the wrong chronics, my bad. Those two chronics comes from Scenario_april_42. I will share it with you.

@marota
Copy link
Contributor Author

marota commented Jun 15, 2021

Hello,

Please next time try to remove any non relevant piece of code that are not useful for the bug you are trying to emphasize.

Tipycally, here, simply put an agent that raises some alarm, and show in the observation that time_since_last_alarm is not updated when the alarm are illegal or something.

Not need for lightsim, no need for runner, no need for episodedata, no need for an agent, so basically, to reproduce the issue (and to debug it, which i did not do yet).

I get that these part are important for the test performed, but not when explaining an issue. For example, is this important that this problem arises at the last observation ?
If the problem is between time_since_last_alarm then why using a runner ? etc. etc.

Sure it can probably still be improved I agree. Note that it was yet reduced by half and simplified quite bit before posting while ensuring issue was still reproducible. Feel free to show the ideal script you would have liked to improve them next time.

@BDonnot
Copy link
Collaborator

BDonnot commented Jun 15, 2021

Simple script:

See, no runner, no episode data, no pandas, no agent, nothing but the bug I have to inspect.

And i can even inspect the env if i want :-)

import grid2op
from grid2op.Parameters import Parameters
env_path='grid2op/data_test/l2rpn_neurips_2020_track1_with_alert'
######


param = Parameters()
# param.init_from_dict({'ALARM_BEST_TIME':6,'ALARM_WINDOW_SIZE':6})

from grid2op.Reward import AlarmReward

env = grid2op.make(env_path, param=param, reward_class=AlarmReward)

# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)

# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)

# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)

# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)

# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(env._attention_budget._all_successful_alarms)

# raise an alarm:
res = env.action_space({"raise_alarm": 1})
obs, reward, done, info = env.step(res)
print(obs.time_since_last_alarm)  # should be 1 (last alarm not sucessfull)

@marota
Copy link
Contributor Author

marota commented Jun 15, 2021

import grid2op
from grid2op.Parameters import Parameters
env_path='/YourPath/env_debug_time_last_alarm_inconsistency'
######


param = Parameters()

from grid2op.Reward import AlarmReward

#WARNING: Make sure in config.py to have attention_budget._max_budget and attention_budget._init_budget to infinity
env = grid2op.make(env_path, param=param, reward_class=AlarmReward)#,attention_max_budget=9999  )
print(env._attention_budget._init_budget)#should be infinity like 9999
print(env._attention_budget._max_budget)#should be infinity like 9999
env.seed(1660836287)#for id_chronic=0

# raise an alarm:
done=False
observations=[]
while not done:
    res = env.action_space({"raise_alarm": 1})
    obs, reward, done, info = env.step(res)
    observations.append(obs)

nb_timestep=len(observations)
time_since_last_alarm = []
for obs in observations[0:nb_timestep]:
    time_since_last_alarm.append(obs.time_since_last_alarm[0])
alarm_timesteps_obs = [t+1 for t in range(0,nb_timestep-1) if time_since_last_alarm[t]==0]#t+1 to be consistent with env timesteps

assert(len(env._attention_budget._all_successful_alarms)==len(alarm_timesteps_obs))

In this one, we see again that env._attention_budget._all_successful_alarms has 2 elements more than alarm_timesteps_obs. Same issue as before

@BDonnot BDonnot reopened this Jun 15, 2021
@BDonnot BDonnot added the cannot_reproduce This issue cannot be reproduced with the provided code snippet meaning they are given a low priority label Jun 15, 2021
@BDonnot
Copy link
Collaborator

BDonnot commented Jun 15, 2021

For me, there is still on bug:

import grid2op
from grid2op.Parameters import Parameters
env_path='./env_debug_time_last_alarm_inconsistency'
######


param = Parameters()

from grid2op.Reward import AlarmReward

#WARNING: Make sure in config.py to have attention_budget._max_budget and attention_budget._init_budget to infinity
env = grid2op.make(env_path, param=param, reward_class=AlarmReward,
                   kwargs_attention_budget={"init_budget": 999, "max_budget": 9999,
                                "budget_per_ts": 1. / (12.*8),
                                "alarm_cost": 1.}
                   )#,attention_max_budget=9999  )
print(env._attention_budget._init_budget)#should be infinity like 9999
print(env._attention_budget._max_budget)#should be infinity like 9999
env.seed(1660836287)  # for id_chronic=0

# raise an alarm:
done = False
observations = []
ts = 0
while not done:
    ts += 1
    res = env.action_space({"raise_alarm": 1})
    obs, reward, done, info = env.step(res)
    observations.append(obs)
    if not done:
        assert obs.time_since_last_alarm[0] == 0
    else:
        # last observation will never have the flag 'time_since_last_alarm'
        assert obs.time_since_last_alarm[0] == -1
    # the environment register all alarms
    assert len(env._attention_budget._all_successful_alarms) == ts

This script run till the end and at each step, it checks that everything works fine.

If something does not work, i suspect it's in the last few lines of your code that do not rely on grid2op.

@marota
Copy link
Contributor Author

marota commented Jun 15, 2021

If you add that in your while:

while not done:
    ts += 1
    res = env.action_space({"raise_alarm": 1})
    obs, reward, done, info = env.step(res)
    observations.append(obs)
    if(ts>=372):
        print('observations[371]:'+str(observations[371].time_since_last_alarm[0])+' at timestep '+str(ts))

you get:
observations[371]:0 at timestep 372
observations[371]:-1 at timestep 373

The value has changed for the last true observation before game-over, which might cause why we are missing the alarm at this timestep in the end

@BDonnot
Copy link
Collaborator

BDonnot commented Jun 22, 2021

Fixed and merged in version 1.6.0, now available on pypi

@BDonnot BDonnot closed this as completed Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cannot_reproduce This issue cannot be reproduced with the provided code snippet meaning they are given a low priority
Projects
None yet
Development

No branches or pull requests

2 participants