Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return shape of the observation space in a custom environment in OpenAI gym #746

Closed
PBerit opened this issue Jan 31, 2022 · 2 comments
Closed
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested

Comments

@PBerit
Copy link

PBerit commented Jan 31, 2022

Hi all,

I have a custom environment in OpenAI gym with 3 discrete action variables and 3 continuous state variables and 1 discrete state variable (observation_space). My question now is what exactly does the step function have to return? I have the following code:

#%% import 
from gym import Env
from gym.spaces import Discrete, Box, Tuple, MultiDiscrete
import numpy as np


#%%
class Custom_Env(Env):

    def __init__(self):
        
       # Define the state space
       
       #State variables
       self.state_1 = 0
       self.state_2 =  0
       self.state_3 = 0
       self.state_4_currentTimeSlots = 0
       
       #Define the gym components

       self.action_space = MultiDiscrete([ 10, 10, 27 ])
                                                                             
       self.observation_space = Box(low=np.array([20, -20, 0, 0]), high=np.array([22, 250, 100, 287]), dtype=np.float32)

    def step(self, action ):

        # Update state variables
        self.state_1 = self.state_1 + action [0]
        self.state_2 = self.state_2 + action [1]
        self.state_3 = self.state_3 + action [2]

        #Calculate reward
        reward = float(self.state_1 + self.state_2 + self.state_3)
       
        #Set placeholder for info
        info = {}    
        
        #Check if it's the end of the day
        if self.state_4_currentTimeSlots >= 287:
            done = True
        if self.state_4_currentTimeSlots < 287:
            done = False       
        
        #Move to the next timeslot 
        self.state_4_currentTimeSlots +=1

        state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])

        #Return step information
        return state, reward, done, info
        
    def render (self):
        pass
    
    def reset (self):
       self.state_1 = 21
       self.state_2 =  0
       self.state_3 = 0
       self.state_4_currentTimeSlots = 0
       state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])

       return state

#%% Set up the environment and check it
from stable_baselines3.common.env_checker import check_env

env = Custom_Env()
# It will check your custom environment and output additional warnings if needed
check_env(env)


from stable_baselines3 import A2C
model = A2C('MlpPolicy', env, verbose=1)

print("Learning started")
model.learn(total_timesteps=10000)
print("Learning ended")

When I check the custom environment with "from stable_baselines3.common.env_checker import check_env from StableBaselines3", I get the assertion error: "AssertionError: The observation returned by the step() method does not match the given observation space". I don't understand this, because I am actually returning 4 values with "state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])". Can you tell me, what the problem might be?

@PBerit PBerit added custom gym env Issue related to Custom Gym Env question Further information is requested labels Jan 31, 2022
@Miffyli
Copy link
Collaborator

Miffyli commented Jan 31, 2022

Make sure you are returning values of correct type as well (np.float32). I recommend you print out what kind of observations (states) your network outputs, and double-check that everything is in the bounds you set.

Btw I recommend you check the tips section of docs, specifically about normalizing inputs / outputs.

Do note that we do not offer extensive tech support for per-case questions. These issues are mainly for bug reports and enhancement proposals.

@PBerit
Copy link
Author

PBerit commented Jan 31, 2022

@Miffyli :Thanks a lot Miffyli for your answer. I really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants