You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a custom environment in OpenAI gym with 3 discrete action variables and 3 continuous state variables and 1 discrete state variable (observation_space). My question now is what exactly does the step function have to return? I have the following code:
#%% import
from gym import Env
from gym.spaces import Discrete, Box, Tuple, MultiDiscrete
import numpy as np
#%%
class Custom_Env(Env):
def __init__(self):
# Define the state space
#State variables
self.state_1 = 0
self.state_2 = 0
self.state_3 = 0
self.state_4_currentTimeSlots = 0
#Define the gym components
self.action_space = MultiDiscrete([ 10, 10, 27 ])
self.observation_space = Box(low=np.array([20, -20, 0, 0]), high=np.array([22, 250, 100, 287]), dtype=np.float32)
def step(self, action ):
# Update state variables
self.state_1 = self.state_1 + action [0]
self.state_2 = self.state_2 + action [1]
self.state_3 = self.state_3 + action [2]
#Calculate reward
reward = float(self.state_1 + self.state_2 + self.state_3)
#Set placeholder for info
info = {}
#Check if it's the end of the day
if self.state_4_currentTimeSlots >= 287:
done = True
if self.state_4_currentTimeSlots < 287:
done = False
#Move to the next timeslot
self.state_4_currentTimeSlots +=1
state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])
#Return step information
return state, reward, done, info
def render (self):
pass
def reset (self):
self.state_1 = 21
self.state_2 = 0
self.state_3 = 0
self.state_4_currentTimeSlots = 0
state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])
return state
#%% Set up the environment and check it
from stable_baselines3.common.env_checker import check_env
env = Custom_Env()
# It will check your custom environment and output additional warnings if needed
check_env(env)
from stable_baselines3 import A2C
model = A2C('MlpPolicy', env, verbose=1)
print("Learning started")
model.learn(total_timesteps=10000)
print("Learning ended")
When I check the custom environment with "from stable_baselines3.common.env_checker import check_env from StableBaselines3", I get the assertion error: "AssertionError: The observation returned by the step() method does not match the given observation space". I don't understand this, because I am actually returning 4 values with "state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])". Can you tell me, what the problem might be?
The text was updated successfully, but these errors were encountered:
Make sure you are returning values of correct type as well (np.float32). I recommend you print out what kind of observations (states) your network outputs, and double-check that everything is in the bounds you set.
Btw I recommend you check the tips section of docs, specifically about normalizing inputs / outputs.
Do note that we do not offer extensive tech support for per-case questions. These issues are mainly for bug reports and enhancement proposals.
Hi all,
I have a custom environment in OpenAI gym with 3 discrete action variables and 3 continuous state variables and 1 discrete state variable (observation_space). My question now is what exactly does the step function have to return? I have the following code:
When I check the custom environment with "from stable_baselines3.common.env_checker import check_env from StableBaselines3", I get the assertion error: "AssertionError: The observation returned by the step() method does not match the given observation space". I don't understand this, because I am actually returning 4 values with "state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])". Can you tell me, what the problem might be?
The text was updated successfully, but these errors were encountered: