Skip to content
Anurag Koul edited this page Aug 19, 2019 · 25 revisions

Basics

Let's begin by importing the basic packages

>>> import gym
>>> import ma_gym 

We have registered all the new multi agent environments

>>> env = gym.make('Switch2-v0')

How many agents does this environment has?

>>> env.n_agents
>>> 2

What's the action space of each agent?

>>> env.action_space
>>> [Discrete(5), Discrete(5)]

What do these actions mean?

>>> env.get_action_meanings() # action meaning of each agent
[['DOWN', 'LEFT', 'UP', 'RIGHT', 'NOOP'], ['DOWN', 'LEFT', 'UP', 'RIGHT', 'NOOP']]
>>> env.get_action_meanings(0) # action meaning of agent '0'
['DOWN', 'LEFT', 'UP', 'RIGHT', 'NOOP']

How do we sample action for each agent? ( much like open-ai gym)

>>> env.action_space.sample()
>>> [0, 2]
>>> env.reset()
>>> [[0.0, 0.17], [0.0, 0.83]]

Let's step into the environment with a random action

>>> obs_n, reward_n, done_n, info = env.step(env.action_space.sample())

Upon step, we get a list of local observation for each agent

>>> obs_n
>>> [[0.0, 0.17], [0.0, 0.83]]

Upon step, We get reward for each agent

>>> reward_n
>>> [-0.1, -0.1]

Also, An episode is considered to be done when all agents die.

>>> done_n
>>> [False, False]
>>> episode_terminate = all(done_n)

And, team reward is simply sum of all local rewards

>>> team_reward = sum(reward_n)

Customizing an environment

import gym

gym.envs.register(
    id='MySwitch2-v0',
    entry_point='ma_gym.envs.switch:Switch',
    kwargs={'n_agents': 2, 'full_observable': False, 'step_cost': -0.2} 
    # It has a step cost of -0.2 now
)
env = gym.make('MySwitch2-v0')

For more usage details , refer to : https://github.com/koulanurag/ma-gym/blob/master/ma_gym/init.py

Monitoring

Please note that the following Monitor package is imported from ma_gym

>>> from ma_gym.wrappers import Monitor
>>> env = gym.make('Switch2-v0')
>>> env = Monitor(env, directory='recordings')

This helps in saving video files in the recordings folder

Tip:

  • Save video of every episode:
>>> env = Monitor(env, directory='recordings',video_callable=lambda episode_id: True)
  • Save video of every 10th episode
>>> env = Monitor(env, directory='recordings',
...               video_callable=lambda episode_id: episode_id%10==0)