You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.
Hi, I'm detaching the trainer from rllib (need of custom stuff incompatible with it). I'm having problems using the model.forward method:
what should I put in seq_lens? I couldn't find any documentation about it.
Last error:
ValueError: Input 0 of layer "permute_1" is incompatible with the layer: expected ndim=5, found ndim=4.
Full shape received: (Dimension(1), Dimension(2), Dimension(11), Dimension(11))
Code context:
from keras_model import build_model
from env_wrapper import RLlibEnvWrapper
from tf_models import KerasConvLSTM, get_flat_obs_size
import tensorflow as tf
from tensorflow.python.framework.ops import enable_eager_execution
# enable_eager_execution()
# Model config and env config as in /tutorials/rllib/phase1/config.yaml
model_config = {
'custom_model': "keras_conv_lstm",
'custom_options': {
'fc_dim': 128,
'idx_emb_dim': 4,
'input_emb_vocab': 100,
'lstm_cell_size': 128,
'num_conv': 2,
'num_fc': 2,
},
'max_seq_len': 25,
}
env_config = {'env_config_dict': {
# ===== SCENARIO CLASS =====
# Which Scenario class to use: the class's name in the Scenario Registry (foundation.scenarios).
# The environment object will be an instance of the Scenario class.
'scenario_name': 'layout_from_file/simple_wood_and_stone',
# ===== COMPONENTS =====
# Which components to use (specified as list of ("component_name", {component_kwargs}) tuples).
# "component_name" refers to the Component class's name in the Component Registry (foundation.components)
# {component_kwargs} is a dictionary of kwargs passed to the Component class
# The order in which components reset, step, and generate obs follows their listed order below.
'components': [
# (1) Building houses
('Build', {
'skill_dist': 'pareto',
'payment_max_skill_multiplier': 3,
'build_labor': 10,
'payment': 10
}),
# (2) Trading collectible resources
('ContinuousDoubleAuction', {
'max_bid_ask': 10,
'order_labor': 0.25,
'max_num_orders': 5,
'order_duration': 50
}),
# (3) Movement and resource collection
('Gather', {
'move_labor': 1,
'collect_labor': 1,
'skill_dist': 'pareto'
}),
# (4) Planner
('PeriodicBracketTax', {
'period': 100,
'bracket_spacing': 'us-federal',
'usd_scaling': 1000,
'disable_taxes': False
})
],
# ===== SCENARIO CLASS ARGUMENTS =====
# (optional) kwargs that are added by the Scenario class (i.e. not defined in BaseEnvironment)
'env_layout_file': 'quadrant_25x25_20each_30clump.txt',
'starting_agent_coin': 10,
'fixed_four_skill_and_loc': True,
# ===== STANDARD ARGUMENTS ======
# kwargs that are used by every Scenario class (i.e. defined in BaseEnvironment)
'n_agents': 4, # Number of non-planner agents (must be > 1)
'world_size': [25, 25], # [Height, Width] of the env world
'episode_length': 1000, # Number of timesteps per episode
# In multi-action-mode, the policy selects an action for each action subspace (defined in component code).
# Otherwise, the policy selects only 1 action.
'multi_action_mode_agents': False,
'multi_action_mode_planner': True,
# When flattening observations, concatenate scalar & vector observations before output.
# Otherwise, return observations with minimal processing.
'flatten_observations': True,
# When Flattening masks, concatenate each action subspace mask into a single array.
# Note: flatten_masks = True is required for masking action logits in the code below.
'flatten_masks': True,
# How often to save the dense logs
'dense_log_frequency': 1
}}
env = RLlibEnvWrapper(env_config)
obs = env.reset()
# so num_outputs must be equal to env.action_space's value
model = KerasConvLSTM(env.observation_space,
env.action_space, num_outputs=50, model_config=model_config, name=None)
state = model.get_initial_state()
# probably the issue is here
# rank_1_tensor = tf.constant([(50,),(136,),(1,),(2,11,11),(7,11,11)], shape=(1,5,1))
rank_1_tensor = tf.constant([1])
flat_obs_space = get_flat_obs_size(env.observation_space)
def dict_to_tensor_dict(a_dict: dict):
"""
pass a single agent obs, returns it's tensor_dict
"""
tensor_dict = {}
for key, value in a_dict.items():
tensor_dict[key] = tf.convert_to_tensor(value, name=key)
return tensor_dict
obs_tensor_dict = dict_to_tensor_dict(obs['0'])
# from /python3.7/site-packages/ray/rllib/models/tf/tf_modelv2.py
# `input_dict` (dict): dictionary of input tensors, including `"obs", "obs_flat", "prev_action", "prev_reward", "is_training"`
input_dict = {
'obs': obs_tensor_dict,
# 'obs_flat': ,
'prev_action': None,
'prev_reward': None,
'is_training': True
}
output, new_state = model.forward(input_dict, state, rank_1_tensor)
Thank you for the help!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi, I'm detaching the trainer from rllib (need of custom stuff incompatible with it). I'm having problems using the model.forward method:
what should I put in seq_lens? I couldn't find any documentation about it.
Last error:
Code context:
Thank you for the help!
The text was updated successfully, but these errors were encountered: