Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write more documentation about environments #106

Closed
joschu opened this issue May 18, 2016 · 27 comments
Closed

Write more documentation about environments #106

joschu opened this issue May 18, 2016 · 27 comments
Assignees

Comments

@joschu
Copy link
Contributor

joschu commented May 18, 2016

We should write a more detailed explanation of every environment, in particular, how the reward function is computed.

@joschu joschu self-assigned this May 18, 2016
@JKCooper2
Copy link
Contributor

Here's how I imagined a basic environment documentation page looking link. Let me know if you have any suggestions then I'll transfer it to MarkDown and see how it looks in that

@nealmcb
Copy link

nealmcb commented Jun 13, 2016

@JKCooper2, thanks for the link. That page seems pretty complete.

I'm not sure how you see the Challenge section working. Can you add or point to some examples?

But I wonder where we want to document these things. To avoid duplication, I'd think the code should contain the primary documentation, including what is documented in your sections 1 and 2: Overview and Environment. I suggest that we should just document each aspect in the appropriate part of the code. There could ideally be a standard build tool to pull out the appropriate documentation (via tags of some sort?) and update a main landing page on the site for each environment, presumably https://gym.openai.com/envs/CartPole-v0, which also documents how various algorithms have worked on it.

For the "Research" section, a link to some pages on the site that describe relevant research would also be fine in the code. Or we might just want to just link from the landing page to an associated wiki page on the site, that could discuss research, proposed algorithms, etc.

@JKCooper2
Copy link
Contributor

For the Challenge section I was thinking that it would have a list of criteria that means you could tell whether your algorithm could theoretically solve the environment (ignoring hyper-parameters / computational limitations).

Examples would include:

  • 1A Discrete versus 1B continuous observation space
  • 2A Discrete versus 2B continuous action space
  • 3A One versus 3B n dimensions in the action space (e.g. Acrobot has three discrete action but they exist along one dimension of the force applied, versus Walker or something where you can control different joints independently)
  • If discrete action: 4A Two versus 4B more than two discrete actions
  • 5 Multiple rewards are available at a single point in time
  • 6 Stochastic action effect (Action can be switched by environment according to percentage)
  • 7 Goal changes over time
  • 8 Observation state incomplete (The observation provided doesn't always contain critical information, e.g. in basketball when the goal end swaps if this were only provided once per quarter)
  • 9 No reward surrounding initial state - any sequence of 20 actions by the agent from the starting state won't results in seeing a reward
  • 10 Task recognition (agent will be trained over multiple environments and need to be able to determine which environment it is currently in)
  • 11 Generalisation (agent will be run over one environment and then needs to apply knowledge to another environment with the same challenge set)

CartPole: 1B, 2A, 3A, 4A
Acrobot: 1B, 2A, 3A, 4B, 9
MountainCar: 1B, 2A, 3A, 4A, 9
Pendulum: 1B, 2B, 3A
Taxi: 1A, 2A, 3B, 4B, 7
Pacman: 1B, 2A, 3B, 4B, 5, 7
Pacmac-Ram: 1B, 2A, 3B, 4B, 5, 7, 8 (maybe)

The idea then being if I create an algorithm that can handle 1B, 2A, 3A, 3B, 4A, 4B, 9 then it should be capable of solving (with only hyper-parameter changes) CartPole and Acrobot, but won't be capable of solving any of the others

That's obviously an incomplete set and the definitions some may not make sense or need to be altered, but it would have a lot of benefits:

  1. Easy to tell what your algorithm will/won't work on and why
  2. Focus for creating environment's that have challenge sets that don't already exist
  3. Good comparability on algorithms with matching 'abilities'
  4. Focus for creating algorithms with specific abilities
  5. Much simpler to take an rl agent and apply it to a real-world problems (just define the problem and select an algorithm that meets the challenge criteria)
  6. Test task recognition and generalisation over similar environments (based on matching challenge sets)

@JKCooper2
Copy link
Contributor

For the environment documentation I was imagining it like a project/assignment description. I don't think people should need to look in the code for information about how the environment works, and would prefer it to be listed independently even if it means some duplication (although not a lot because it would only be updated if the environment version changes).

The Research section was partly to make it easier for researchers to identify was relevant papers exist for what they are working on, but also to encourage people to replicate existing research algorithms in order to improve the benchmark quality. I think replicated published research algorithms should be given special treatment or at least marked differently so people can easily see "This algorithm is a working copy of this paper". Having wiki style info regarding the papers could be useful, but I think it would work better to have links from the environment documentation research section to a summary page for the paper that has that information.

I was thinking it would sit on something like https://readthedocs.org/ where the documentation would be updated via git and you can have sub-menu's on the side to choose which section you're viewing.

Discussion and comments should be separate from documentation, maybe a forum. The goals as I see it should be to make it simple for people to understand the task (docs), share relevant information (research), come up with new ideas (forum/gitter), and focus effort (challenges/requests for research).

@nealmcb
Copy link

nealmcb commented Jun 16, 2016

Thanks. I agree that the documentation should be clear about research that it was based on etc, and the forum/wiki would just be to make it easier for folks to comment and add information.

Re: "Challenge", that was the sense I had, and your details help a lot.

The code already defines a lot of this with great precision via the Env.{action_space, observation_space,reward_range} variables. I'm hoping it would be easier to just capture that information via introspection, hopefully as part of the build process, and automatically generate a concise and easy to use representation of it for inclusion in the documentation. Otherwise, we run the risk of the documentation lagging behind the code or disagreeing with it.

I haven't yet looked at enough environments here to be sure what you mean by 7, 8, 9, but more generally, a useful categorization scheme for AI environments, based on Russell and Norvig (2009), is at https://en.wikibooks.org/wiki/Artificial_Intelligence/AI_Agents_and_their_Environments:

"[they] can be remembered with the mnemonic "D-SOAKED." They are:

  • Deterministicness (deterministic or stochastic or Non-deterministic): An environment is deterministic if the next state is perfectly predictable given knowledge of the previous state and the agent's action.
    • Staticness (static or dynamic): Static environments do not change while the agent deliberates.
  • Observability (full or partial): A fully observable environment is one in which the agent has access to all information in the environment relevant to its task.
  • Agency (single or multiple): If there is at least one other agent in the environment, it is a multi-agent environment. Other agents might be apathetic, cooperative, or competitive.
  • Knowledge (known or unknown): An environment is considered to be "known" if the agent understands the laws that govern the environment's behavior. For example, in chess, the agent would know that when a piece is "taken" it is removed from the game. On a street, the agent might know that when it rains, the streets get slippery.
  • Episodicness (episodic or sequential): Sequential environments require memory of past actions to determine the next best action. Episodic environments are a series of one-shot actions, and only the current (or recent) percept is relevant. An AI that looks at radiology images to determine if there is a sickness is an example of an episodic environment. One image has nothing to do with the next.
  • Discreteness (discrete or continuous or ): A discrete environment has fixed locations or time intervals. A continuous environment could be measured quantitatively to any level of precision."

As far as I've seen, the gym might not yet support some of these options. But adding a way to encode and document and perhaps declare (in the code) the rest of those would indeed be helpful. I imagine this has come up before - does anyone know if we can leverage other existing work on a typology of environments?

@nealmcb
Copy link

nealmcb commented Jun 17, 2016

To play around with extracting useful documentation automatically from the code, I wrote a little program to query a bunch of things about the environments for display in a markdown table. Here is the code and the output, on the subset for which make() works for me at the moment. A few variables which don't vary in this dataset are commented out.

from gym import envs

class NullE:
    def __init__(self):
        self.observation_space = self.action_space = self.reward_range = "N/A"

envall = envs.registry.all()

table = "|Environment Id|Observation Space|Action Space|Reward Range|tStepL|Trials|rThresh\n" # |Local|nonDet|kwargs|
table += "|---|---|---|---|---|---|---|---|---|---|\n"

for e in envall:
    try:
        env = e.make()
    except:
        env = NullE()
        continue  #  Skip these for now
    table += '| {}|{}|{}|{}|{}|{}|{}|\n'.format(e.id,   #|{}|{}|{}
       env.observation_space, env.action_space, env.reward_range,
       e.timestep_limit, e.trials, e.reward_threshold) # ,
       # getattr(e, 'local_only', -1), e.nondeterministic, getattr(e, 'kwargs', ""))

print(table)
Environment Id Observation Space Action Space Reward Range tStepL Trials rThresh
CartPole-v0 Box(4,) Discrete(2) (-inf, inf) 200 100 195.0
NChain-v0 Discrete(5) Discrete(2) (-inf, inf) 1000 100 None
RepeatCopy-v0 Discrete(6) Tuple(Discrete(2), Discrete(2), Discrete(5)) (-inf, inf) 200 100 75.0
Reverse-v0 Discrete(3) Tuple(Discrete(2), Discrete(2), Discrete(2)) (-inf, inf) 200 100 25.0
ReversedAddition-v0 Discrete(4) Tuple(Discrete(4), Discrete(2), Discrete(3)) (-inf, inf) 200 100 25.0
Acrobot-v0 Box(4,) Discrete(3) (-inf, inf) 200 100 -100
FrozenLake-v0 Discrete(16) Discrete(4) (-inf, inf) 100 100 0.78
Taxi-v1 Discrete(500) Discrete(6) (-inf, inf) 200 100 9.7
Pendulum-v0 Box(3,) Box(1,) (-inf, inf) 200 100 None
OneRoundNondeterministicReward-v0 Discrete(1) Discrete(2) (-inf, inf) 1000 100 None
ReversedAddition3-v0 Discrete(4) Tuple(Discrete(4), Discrete(2), Discrete(3)) (-inf, inf) 200 100 25.0
Roulette-v0 Discrete(1) Discrete(38) (-inf, inf) 100 100 None
MountainCar-v0 Box(2,) Discrete(3) (-inf, inf) 200 100 -110.0
FrozenLake8x8-v0 Discrete(64) Discrete(4) (-inf, inf) 200 100 0.99
DuplicatedInput-v0 Discrete(6) Tuple(Discrete(2), Discrete(2), Discrete(5)) (-inf, inf) 200 100 9.0
Blackjack-v0 Tuple(Discrete(32), Discrete(11), Discrete(2)) Discrete(2) (-inf, inf) 1000 100 None
Copy-v0 Discrete(6) Tuple(Discrete(2), Discrete(2), Discrete(5)) (-inf, inf) 200 100 25.0
TwoRoundDeterministicReward-v0 Discrete(3) Discrete(2) (-inf, inf) 1000 100 None
TwoRoundNondeterministicReward-v0 Discrete(3) Discrete(2) (-inf, inf) 1000 100 None
OneRoundDeterministicReward-v0 Discrete(1) Discrete(2) (-inf, inf) 1000 100 None

@tlbtlbtlb
Copy link
Contributor

That table is extremely useful.

@JKCooper2
Copy link
Contributor

I like the table. It should be possible to export the bounds of the box spaces as well with some minor adjustments. The environments will only change very rarely so I wouldn't get too hung up on having it being exported. The D-SOAKED listing is good but I don't think it covers enough of the agents required abilities, e.g. All of the environments in the classic control section fall under the same D-SOAKED criteria yet you can't take all of the algorithms that solved one and have them solve the rest.

For '7 Goal changes over time' an example could be Reacher that has a randomly located target as opposed to Acrobot where the target is always the same. This can also mean that a straight decaying exploration rate mightn't be effective
For '8 Observation state incomplete' I mean that the agent could be given information that it needs to use in future states. E.g. Reacher where the agent is told the position and then has 50 time steps (without being retold the position, it just gets information about it's own location) to reach towards the goal, being scored on the 50th step
'9 Fixed reward surrounding initial state' is to cover exploration scenarios like MountainCar, Acrobot, FrozenLake where the agent needs to perform a long chain of actions in order to see a different reward.

@nealmcb
Copy link

nealmcb commented Jun 17, 2016

Yes - adding the bounds is on my list. I actually think that the repr function of a space should conform to the norm and return a full, evalable string with the bounds. Perhaps the str function could return what repr does now, for simplicity and better upward-compatibility. And for convenience, the constructor should work with a list of low or high bounds, not just an array of them.

In the meantime, here is a version of the table sorted by parameterization, with hot links, for your viewing pleasure. One of the values of generating the documentation from the source, or at least from a nice clean machine-readable format, is the ease of sorting, comparing, searching etc.

Note that it seems that some of the environments don't have a page on the gym site yet, and generate "Something went wrong! We've been notified and are fixing it.". E.g. https://gym.openai.com/envs/OneRoundNondeterministicReward-v0

Environment Id Observation Space Action Space Reward Range tStepL Trials rThresh
MountainCar-v0 Box(2,) Discrete(3) (-inf, inf) 200 100 -110.0
SemiSupervisedPendulumRandom-v0 Box(3,) Box(1,) (-inf, inf) 1000 100 None
SemiSupervisedPendulumDecay-v0 Box(3,) Box(1,) (-inf, inf) 1000 100 None
SemiSupervisedPendulumNoise-v0 Box(3,) Box(1,) (-inf, inf) 1000 100 None
Pendulum-v0 Box(3,) Box(1,) (-inf, inf) 200 100 None
CartPole-v0 Box(4,) Discrete(2) (-inf, inf) 200 100 195.0
Acrobot-v0 Box(4,) Discrete(3) (-inf, inf) 200 100 -100
InterpretabilityCartpoleObservations-v0 Box(4,) Tuple(Discrete(2), Box(4,), Box(4,), Box(4,), Box(4,), Box(4,)) (-inf, inf) 1000 100 None
InterpretabilityCartpoleActions-v0 Box(4,) Tuple(Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2)) (-inf, inf) 1000 100 None
OneRoundNondeterministicReward-v0 Discrete(1) Discrete(2) (-inf, inf) 1000 100 None
OneRoundDeterministicReward-v0 Discrete(1) Discrete(2) (-inf, inf) 1000 100 None
Roulette-v0 Discrete(1) Discrete(38) (-inf, inf) 100 100 None
FrozenLake-v0 Discrete(16) Discrete(4) (-inf, inf) 100 100 0.78
TwoRoundDeterministicReward-v0 Discrete(3) Discrete(2) (-inf, inf) 1000 100 None
TwoRoundNondeterministicReward-v0 Discrete(3) Discrete(2) (-inf, inf) 1000 100 None
Reverse-v0 Discrete(3) Tuple(Discrete(2), Discrete(2), Discrete(2)) (-inf, inf) 200 100 25.0
ReversedAddition-v0 Discrete(4) Tuple(Discrete(4), Discrete(2), Discrete(3)) (-inf, inf) 200 100 25.0
ReversedAddition3-v0 Discrete(4) Tuple(Discrete(4), Discrete(2), Discrete(3)) (-inf, inf) 200 100 25.0
NChain-v0 Discrete(5) Discrete(2) (-inf, inf) 1000 100 None
Taxi-v1 Discrete(500) Discrete(6) (-inf, inf) 200 100 9.7
Copy-v0 Discrete(6) Tuple(Discrete(2), Discrete(2), Discrete(5)) (-inf, inf) 200 100 25.0
RepeatCopy-v0 Discrete(6) Tuple(Discrete(2), Discrete(2), Discrete(5)) (-inf, inf) 200 100 75.0
DuplicatedInput-v0 Discrete(6) Tuple(Discrete(2), Discrete(2), Discrete(5)) (-inf, inf) 200 100 9.0
FrozenLake8x8-v0 Discrete(64) Discrete(4) (-inf, inf) 200 100 0.99
OffSwitchCartpole-v0 Tuple(Discrete(2), Box(4,)) Discrete(2) (-inf, inf) 1000 100 None
Blackjack-v0 Tuple(Discrete(32), Discrete(11), Discrete(2)) Discrete(2) (-inf, inf) 1000 100 None

@Timopheym
Copy link

@gdb why not to open a wiki here so we can move this awesome table there and have a community-driven documentation?

@nealmcb
Copy link

nealmcb commented Jun 23, 2016

I like that idea, @Timopheym. I don't know if we want to use the wiki feature here, but I decided to "Be Bold" as we say on Wikipedia, and went ahead and put up an example of using it for this. I also expanded the table to 158 environments: all the ones I could "make" with a standard pip install gym[all]:
https://github.com/openai/gym/wiki/Table-of-environments

@gdb
Copy link
Collaborator

gdb commented Jun 23, 2016

(Enabled the wiki! Please make edits!)

On Wednesday, June 22, 2016, Neal McBurnett notifications@github.com
wrote:

I like that idea, @Timopheym https://github.com/Timopheym. I don't know
if we want to use the wiki feature here, but I went ahead to put up an
example of using it for this. I also expanded the table to 158
environments: all the ones I could "make" with a standard pip install
gym[all]:
https://github.com/openai/gym/wiki/Table-of-environments


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#106 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAM7kfR-trgDhCf5T5BSF6wW_tRWyOK2ks5qOeNUgaJpZM4IhR7e
.

Sent from mobile

@kovacspeter
Copy link

kovacspeter commented Jun 23, 2016

It would be great if there were also bounds for Box actions eg. [-1,1], also MuJoCo environments are currently missing.

@nkming2
Copy link

nkming2 commented Apr 30, 2017

The table is currently not shown correctly on the wiki page. This patch should fix that. Cheers
https://gist.github.com/nkming2/f04d7a350d1e497014b23258ea9f4304

@abhigenie92
Copy link

Is there a way for defining an environment, where I can change the action space at each step?

@aurelien-clu
Copy link

@abhigenie92
How is the action space changing at each step?

If it changes across several expected configuration you could have the following:

  • a model with all the possible action as outputs
  • a custom final layer choosing the action among the available ones (your changing action space)
  • possibly the information about the expected action space as input (ex: Situation A, B or C) to help your model or a negative reward if the model does not recognize the situation (by having activated 'bad' outputs)

Otherwise I think you need to define your own Space class by extending gym.Space:
https://github.com/openai/gym/blob/master/gym/core.py

@madvn
Copy link
Contributor

madvn commented Nov 2, 2017

Does something like this exist for MuJoCo environments? I am especially interested in finding the values of simulation specific params in MuJoCo such as 'dt' and also termination conditions.

@rkaplan
Copy link

rkaplan commented Mar 16, 2018

Also wondering if there are more details about the MuJoCo environments. It would be nice to have more information about them on the website. Specifically I'm trying to check which MuJoCo environments are deterministic / stochastic.

@ling-pan
Copy link

I am wondering what each byte in the RAM means. Could anyone explain each field in the RAM, please?

@nikonikolov
Copy link

Hey, I fully agree there should be more documentation about environments. In my personal experience the most commonly needed information is:

  1. Observation: space type, shape, limits, components interpretation (if any - e.g. position, speed, etc.)
  2. Action space type, shape and limits, components interpretation
  3. Deterministic or stochastic. If stochastic, in what way exactly

It is not that this information cannot be found, but it usually takes much more time than it can take if it was properly summarized. For example, currently there is no documentation about stochasticity in MuJoCo environments and only a couple have information about the interpretation of components in the observation/action space. For Atari environments, there is no clear documentation about all the different versions and one has to dig through the code.

I have already collected some info which is currently not in the wiki (mainly about the atari environments, but it is very likely that I will also have to do the same for the MuJoCo ones). I really want to share this info on the wiki. Is there a required/recommended way to do this, or I can just follow the current examples such as https://github.com/openai/gym/wiki/BipedalWalker-v2.

@bionicles
Copy link
Contributor

bionicles commented Dec 31, 2018

import gym

class String(gym.Space):
    def __init__(self, length=None, min_length=1, max_length=280, min=0, max=127):
        self.length = length
        self.min_length = min_length
        self.max_length = max_length
        self.letters = string.ascii_letters + " .,!-"

    def sample():
        length = random.randint(self.min_length, self.max_length)
        string = ""
        for i in range(length):
            letter = random.choice(self.letters)
            string += letter
        return string

    def contains(self, x):
        return type(x) is "str" and len(x) > 0

@ghost
Copy link

ghost commented Feb 24, 2019

@nikonikolov could you please share your info on the Atari environments? I'm finding it very hard to figure them out

@nikonikolov
Copy link

Below is the info I have from my logs. This is from a few months ago, I have not checked if there have been any changes since then.

AtariEnvNoFrameskip-v4

  • max_ep_steps = 400000 (300000 for SpaceInvaders)
  • observe every single frame
  • deterministic actions

AtariEnvNoFrameskip-v0

  • max_ep_steps = 400000 (300000 for SpaceInvaders)
  • observe every single frame
  • stochastic actions - repeat previous action with probability 0.25

AtariEnvDeterministic-v4

  • max_ep_steps = 100000
  • repeat action 4 times and observe only the last frame (3 times for SpaceInvaders)
  • deterministic actions

AtariEnvDeterministic-v0

  • max_ep_steps = 100000
  • repeat action 4 times and observe only the last frame (3 times for SpaceInvaders)
  • stochastic actions - repeat previous action with probability 0.25

AtariEnv-v4

  • max_ep_steps = 100000
  • repeat action a random number from [2,3,4] and observe only the last frame (random frameskip)
  • deterministic actions

AtariEnv-v0

  • max_ep_steps = 100000
  • repeat action a random number from [2,3,4] and observe only the last frame (random frameskip)
  • stochastic actions - repeat previous action with probability 0.25

Additional points to bear in mind:

  • The initial state is deterministic
  • Most of the Atari papers (including DQN and derivatives by DeepMind) set the max number of steps that the simulator executes to 108000.

Please someone correct me if I got anything wrong.

@nealmcb
Copy link

nealmcb commented Apr 18, 2019

I dare say this should be part of the Gym codebase, and integrated into updates to the algorithms. But for now here is the latest version of the code, with more sensible natural sorting integrated into it, and used just now to update the table in the wiki with the wealth of new environments in Gym.

import re
from operator import attrgetter
from gym import envs

class NullE:
    def __init__(self):
        self.observation_space = self.action_space = self.reward_range = "N/A"

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)

    >>> alist = [
    ...          'Orange County--1-3-288-117',
    ...          'Orange County--48256-242',
    ...          'Orange County--1-3-388-203',
    ...          'Orange County--1-19-19-150',
    ...          'Orange County--1-1-64-290',
    ...          'Orange County--1-1-55-256']
    >>> alist.sort(key=natural_keys)
    >>> from pprint import pprint
    >>> pprint(alist)
    [u'Orange County--1-1-55-256',
     u'Orange County--1-1-64-290',
     u'Orange County--1-3-288-117',
     u'Orange County--1-3-388-203',
     u'Orange County--1-19-19-150',
     u'Orange County--48256-242']
    '''

    return [ atoi(c) for c in re.split('(\d+)', text.split('|')[2]) ]

def atoi(text):
    "Convert text to integer, or return it unmodified if it isn't numeric"

    return int(text) if text.isdigit() else text

# TODO: Make first column a link, e.g. to [WizardOfWor-ram-v0](https://gym.openai.com/envs/WizardOfWor-ram-v0)
envall = envs.registry.all()

URL_PREFIX = 'https://gym.openai.com/envs'

table = []
for e in envall:
    try:
        env = e.make()
    except:
        env = NullE()
        continue  #  Skip these for now
    table.append('| {}|{}|{}|{}|{}|{}|{}|'.format(
       '[%s](%s/%s)' % (e.id, URL_PREFIX, e.id),
       env.observation_space, env.action_space, env.reward_range,
       e.timestep_limit, e.trials, e.reward_threshold)) # ,
       # getattr(e, 'local_only', -1), e.nondeterministic, getattr(e, 'kwargs', ""))

    # if len(table) > 30:  # For quicker testing
    #   break

# Sort by 2nd column: Observation Space name
table = sorted(table, key=natural_keys)

# Add headings
table = ["|Environment Id|Observation Space|Action Space|Reward Range|tStepL|Trials|rThresh", # |Local|nonDet|kwargs|
         "|---|---|---|---|---|---|---|"] + table

print('\n'.join(table))

@KiaraGrouwstra
Copy link
Contributor

env tables inspired by @nealmcb but using Pandas in a notebook:

from collections import OrderedDict
from operator import mul
from functools import reduce
import numpy as np
import pandas as pd
from gym import envs

def space_size(spc):
  '''number of bytes in a space'''
  return 1 if not spc.shape else spc.dtype.itemsize * (spc.shape if spc.shape is int else reduce(mul, spc.shape, 1))

def space_cont(spc):
  '''whether a space is continuous'''
  return np.issubdtype(spc.dtype, np.floating)

def env_props(env):
  obs = env.observation_space
  act = env.action_space
  return OrderedDict([
    ('name', env.spec.id),
    ('obs_cont', space_cont(obs)),
    ('obs_size', space_size(obs)),
    ('stochastic', env.spec.nondeterministic),  # - deterministic vs stochastic (~= ^?)
    ('act_cont', space_cont(act)),
    ('act_size', space_size(act)),
#     ('reward_range', env.reward_range),
#     ('timestep_limit', env.timestep_limit),
#     ('trials', env.trials),
#     ('reward_threshold', env.reward_threshold),
  ])

def make_env(env_):
    try:
        env = env_.make()
    except:
        env = None
    return env

envall = envs.registry.all()
envs = [make_env(env) for env in envall]
envs = [x for x in envs if x is not None]

rows = [env_props(env) for env in envs]

# our env dataframe, show in a notebook cell
df = pd.DataFrame(rows)
df

# and a pivot!
mean = lambda x: round(np.mean(x), 2)
idx = ['obs_cont', 'act_cont', 'stochastic']
aggs = {
    'name': len,
    'obs_size': mean,
    'act_size': mean,
}
pd.pivot_table(df, index=idx, aggfunc=aggs)

@Amritpal-001
Copy link

@joschu @JKCooper2 @nealmcb I compiled documentation on Fetch environments https://link.medium.com/CV6la7YfV7, I have tried to cover the observation and action variables, reward function, and comparison among all 4 fetch environments.

I hope it's helpful. Please add more info if you have any.

@jkterry1
Copy link
Collaborator

Closing in favor of #2276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests