-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write more documentation about environments #106
Comments
Here's how I imagined a basic environment documentation page looking link. Let me know if you have any suggestions then I'll transfer it to MarkDown and see how it looks in that |
@JKCooper2, thanks for the link. That page seems pretty complete. I'm not sure how you see the Challenge section working. Can you add or point to some examples? But I wonder where we want to document these things. To avoid duplication, I'd think the code should contain the primary documentation, including what is documented in your sections 1 and 2: Overview and Environment. I suggest that we should just document each aspect in the appropriate part of the code. There could ideally be a standard build tool to pull out the appropriate documentation (via tags of some sort?) and update a main landing page on the site for each environment, presumably https://gym.openai.com/envs/CartPole-v0, which also documents how various algorithms have worked on it. For the "Research" section, a link to some pages on the site that describe relevant research would also be fine in the code. Or we might just want to just link from the landing page to an associated wiki page on the site, that could discuss research, proposed algorithms, etc. |
For the Challenge section I was thinking that it would have a list of criteria that means you could tell whether your algorithm could theoretically solve the environment (ignoring hyper-parameters / computational limitations). Examples would include:
CartPole: 1B, 2A, 3A, 4A The idea then being if I create an algorithm that can handle 1B, 2A, 3A, 3B, 4A, 4B, 9 then it should be capable of solving (with only hyper-parameter changes) CartPole and Acrobot, but won't be capable of solving any of the others That's obviously an incomplete set and the definitions some may not make sense or need to be altered, but it would have a lot of benefits:
|
For the environment documentation I was imagining it like a project/assignment description. I don't think people should need to look in the code for information about how the environment works, and would prefer it to be listed independently even if it means some duplication (although not a lot because it would only be updated if the environment version changes). The Research section was partly to make it easier for researchers to identify was relevant papers exist for what they are working on, but also to encourage people to replicate existing research algorithms in order to improve the benchmark quality. I think replicated published research algorithms should be given special treatment or at least marked differently so people can easily see "This algorithm is a working copy of this paper". Having wiki style info regarding the papers could be useful, but I think it would work better to have links from the environment documentation research section to a summary page for the paper that has that information. I was thinking it would sit on something like https://readthedocs.org/ where the documentation would be updated via git and you can have sub-menu's on the side to choose which section you're viewing. Discussion and comments should be separate from documentation, maybe a forum. The goals as I see it should be to make it simple for people to understand the task (docs), share relevant information (research), come up with new ideas (forum/gitter), and focus effort (challenges/requests for research). |
Thanks. I agree that the documentation should be clear about research that it was based on etc, and the forum/wiki would just be to make it easier for folks to comment and add information. Re: "Challenge", that was the sense I had, and your details help a lot. The code already defines a lot of this with great precision via the Env.{action_space, observation_space,reward_range} variables. I'm hoping it would be easier to just capture that information via introspection, hopefully as part of the build process, and automatically generate a concise and easy to use representation of it for inclusion in the documentation. Otherwise, we run the risk of the documentation lagging behind the code or disagreeing with it. I haven't yet looked at enough environments here to be sure what you mean by 7, 8, 9, but more generally, a useful categorization scheme for AI environments, based on Russell and Norvig (2009), is at https://en.wikibooks.org/wiki/Artificial_Intelligence/AI_Agents_and_their_Environments:
As far as I've seen, the gym might not yet support some of these options. But adding a way to encode and document and perhaps declare (in the code) the rest of those would indeed be helpful. I imagine this has come up before - does anyone know if we can leverage other existing work on a typology of environments? |
To play around with extracting useful documentation automatically from the code, I wrote a little program to query a bunch of things about the environments for display in a markdown table. Here is the code and the output, on the subset for which make() works for me at the moment. A few variables which don't vary in this dataset are commented out.
|
That table is extremely useful. |
I like the table. It should be possible to export the bounds of the box spaces as well with some minor adjustments. The environments will only change very rarely so I wouldn't get too hung up on having it being exported. The D-SOAKED listing is good but I don't think it covers enough of the agents required abilities, e.g. All of the environments in the classic control section fall under the same D-SOAKED criteria yet you can't take all of the algorithms that solved one and have them solve the rest. For '7 Goal changes over time' an example could be Reacher that has a randomly located target as opposed to Acrobot where the target is always the same. This can also mean that a straight decaying exploration rate mightn't be effective |
Yes - adding the bounds is on my list. I actually think that the repr function of a space should conform to the norm and return a full, evalable string with the bounds. Perhaps the str function could return what repr does now, for simplicity and better upward-compatibility. And for convenience, the constructor should work with a list of low or high bounds, not just an array of them. In the meantime, here is a version of the table sorted by parameterization, with hot links, for your viewing pleasure. One of the values of generating the documentation from the source, or at least from a nice clean machine-readable format, is the ease of sorting, comparing, searching etc. Note that it seems that some of the environments don't have a page on the gym site yet, and generate "Something went wrong! We've been notified and are fixing it.". E.g. https://gym.openai.com/envs/OneRoundNondeterministicReward-v0
|
@gdb why not to open a wiki here so we can move this awesome table there and have a community-driven documentation? |
I like that idea, @Timopheym. I don't know if we want to use the wiki feature here, but I decided to "Be Bold" as we say on Wikipedia, and went ahead and put up an example of using it for this. I also expanded the table to 158 environments: all the ones I could "make" with a standard pip install gym[all]: |
(Enabled the wiki! Please make edits!) On Wednesday, June 22, 2016, Neal McBurnett notifications@github.com
Sent from mobile |
It would be great if there were also bounds for Box actions eg. [-1,1], also MuJoCo environments are currently missing. |
The table is currently not shown correctly on the wiki page. This patch should fix that. Cheers |
Is there a way for defining an environment, where I can change the action space at each step? |
@abhigenie92 If it changes across several expected configuration you could have the following:
Otherwise I think you need to define your own Space class by extending gym.Space: |
Does something like this exist for MuJoCo environments? I am especially interested in finding the values of simulation specific params in MuJoCo such as 'dt' and also termination conditions. |
Also wondering if there are more details about the MuJoCo environments. It would be nice to have more information about them on the website. Specifically I'm trying to check which MuJoCo environments are deterministic / stochastic. |
I am wondering what each byte in the RAM means. Could anyone explain each field in the RAM, please? |
Hey, I fully agree there should be more documentation about environments. In my personal experience the most commonly needed information is:
It is not that this information cannot be found, but it usually takes much more time than it can take if it was properly summarized. For example, currently there is no documentation about stochasticity in MuJoCo environments and only a couple have information about the interpretation of components in the observation/action space. For Atari environments, there is no clear documentation about all the different versions and one has to dig through the code. I have already collected some info which is currently not in the wiki (mainly about the atari environments, but it is very likely that I will also have to do the same for the MuJoCo ones). I really want to share this info on the wiki. Is there a required/recommended way to do this, or I can just follow the current examples such as https://github.com/openai/gym/wiki/BipedalWalker-v2. |
|
@nikonikolov could you please share your info on the Atari environments? I'm finding it very hard to figure them out |
Below is the info I have from my logs. This is from a few months ago, I have not checked if there have been any changes since then.
Additional points to bear in mind:
Please someone correct me if I got anything wrong. |
I dare say this should be part of the Gym codebase, and integrated into updates to the algorithms. But for now here is the latest version of the code, with more sensible natural sorting integrated into it, and used just now to update the table in the wiki with the wealth of new environments in Gym.
|
env tables inspired by @nealmcb but using Pandas in a notebook: from collections import OrderedDict
from operator import mul
from functools import reduce
import numpy as np
import pandas as pd
from gym import envs
def space_size(spc):
'''number of bytes in a space'''
return 1 if not spc.shape else spc.dtype.itemsize * (spc.shape if spc.shape is int else reduce(mul, spc.shape, 1))
def space_cont(spc):
'''whether a space is continuous'''
return np.issubdtype(spc.dtype, np.floating)
def env_props(env):
obs = env.observation_space
act = env.action_space
return OrderedDict([
('name', env.spec.id),
('obs_cont', space_cont(obs)),
('obs_size', space_size(obs)),
('stochastic', env.spec.nondeterministic), # - deterministic vs stochastic (~= ^?)
('act_cont', space_cont(act)),
('act_size', space_size(act)),
# ('reward_range', env.reward_range),
# ('timestep_limit', env.timestep_limit),
# ('trials', env.trials),
# ('reward_threshold', env.reward_threshold),
])
def make_env(env_):
try:
env = env_.make()
except:
env = None
return env
envall = envs.registry.all()
envs = [make_env(env) for env in envall]
envs = [x for x in envs if x is not None]
rows = [env_props(env) for env in envs]
# our env dataframe, show in a notebook cell
df = pd.DataFrame(rows)
df
# and a pivot!
mean = lambda x: round(np.mean(x), 2)
idx = ['obs_cont', 'act_cont', 'stochastic']
aggs = {
'name': len,
'obs_size': mean,
'act_size': mean,
}
pd.pivot_table(df, index=idx, aggfunc=aggs) |
@joschu @JKCooper2 @nealmcb I compiled documentation on Fetch environments https://link.medium.com/CV6la7YfV7, I have tried to cover the observation and action variables, reward function, and comparison among all 4 fetch environments. I hope it's helpful. Please add more info if you have any. |
Closing in favor of #2276 |
We should write a more detailed explanation of every environment, in particular, how the reward function is computed.
The text was updated successfully, but these errors were encountered: