-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify envs to be compatible with twrl #8
Conversation
Looks good! Might as well copy as much of the API as we can. Could rename Otherwise agreed with making the reward structure the same as |
Could you point me to where I'll get to work on the reward schemes for things that cross over to gym! |
Ah sorry that is only mentioned in the docs, but isn't implemented in any of the envs provided - the context can be found in Kaixhin/Atari#56. I've notified the author of the PR that we're moving towards a more Sorry but do you mind doing |
I tried to change rewards for the CartPole environment to match the reward scheme for gym but ran into issues. For gym there is +1 for the time the pole is up, 0 for the time the pole is down. Just a sanity check should this then be Once I resolved why this is the case I want to add an additional function to the init that will list all environments that are possible, if that sounds good! EDIT: I think I found the reason for the inconsistencies, gym has max steps for each of their environment (terminate after those amount of steps to prevent it going forever). Should we implement this into rlenvs? |
Yep sanity check looks correct to me - let's go ahead with changing the reward to match. I see you've already implemented the list - looks good. Remember to document in the README as well. Is the max steps setting for all environments (including Atari)? If it's only there for the simple envs then it would be a good thing to add it into FYI I'm going to add the environment from Towards Deep Symbolic Reinforcement Learning into the |
Sounds good, I'll work towards that. So for the max step thing, here is how it's done for a few environments in openai-gym. How about I expose the step method in the local classic = require 'classic'
local Env = classic.class('Env')
-- Denote interfaces
Env:mustHave('start')
Env:mustHave('act') -- change name here to prevent overload
Env:mustHave('getStateSpace')
Env:mustHave('getActionSpace')
Env:mustHave('getRewardSpace')
function Env:step(action)
local reward, state, terminal = self:act(action)
self.currentStep = self.currentStep == nil and 0 or self.currentStep
if self.maxSteps and self.currentStep % self.maxSteps == 0 then -- env defines maxSteps
terminal = true
self.currentStep = 0
end
self.currentStep = self.currentStep + 1
end
return Env |
Overall the solution is good, but a few suggestions. I'd go with |
Not entirely sure tbh, had a brain fart :P Made the changes and will now go through and add default steps as done in openai-gym (I've set the default to 1000, which is the default for openAI-gym, but if it's an issue lemme know!). |
I think the cleanest way to do this is to have After |
I've made a few changes, added an init function in the Also for rendering, shall we make it possible to render via the |
Env:mustHave('_step') | ||
Env:mustHave('getStateSpace') | ||
Env:mustHave('getActionSpace') | ||
Env:mustHave('getRewardSpace') | ||
|
||
function Env:_init(opts) | ||
if opts.timeStepLimit and opts.maxSteps then | ||
self.maxSteps = math.min(opts.timeStepLimit, opts.maxSteps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this how gym
does things? To me it makes more sense that if the user specifies maxSteps
, then this should override any defaults. And then if timeStepLimit
is available, that should be used. Then 1000
. Makes the logic a little bit simpler too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double check but I do think this is correct (in terms of same as openAI-gym), however since I think this is mainly done for leaderboard purposes (as Kory said) I think we could make an exception here, what do you think?
@korymath Rendering would be nice - any suggestions for how best to integrate it with |
A boolean in the opts would be a good means by which to pass the option, but is perhaps not the full solution. My inclination is that this would allow for a strictly Lua RL infrastructure, free of connection to gym, and thus the rendering could be completely separate from that rendering. Having not done much graphics work in Lua, I am not sure how to best make this happen. One potential means would be to save the transitions, and then format them in such a way that they can be loaded and replayed in the gym infra (that is, if the environments are exactly the same as those in gym, but that may be limiting, and require offline processing in between learning and display). I would argue, that, while rendering is nice, it is not critical to learning (unless there is a learning signal derived from pixels, or a human observer of the visualization, or maybe a third case?). Thus, even a basic render of the learning performance could be generated every N steps, much like how the OpenAI gym allows for a video to be rendered. Nice works on the max steps discussion, that is an interesting quirk of the OpenAI gym code, that i think is based on the online leaderboard, and the desire for shorter episode lengths for direct comparison. |
I added a |
This looks great! Thoughts on how to handle execution on a server? Maybe similar to gym and On 24 October 2016 at 11:14, Sean Naren notifications@github.com wrote:
Kory Mathewson |
Thanks @korymath! That's an interesting idea, maybe similar to how the gym API is currently interfaced with? If this is deemed necessary I could start thinking how to implement this (probably out the scope of this PR) |
Sounds great! Agree it is probably another large feature build. On 25 October 2016 at 12:49, Sean Naren notifications@github.com wrote:
Kory Mathewson |
|
||
function Env:render() | ||
if self.qt and self.getDisplay then | ||
self.window = self.window == nil and image.display({ image = self:getDisplay(), zoom = 10 }) or self.window |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zoom
should be part of opts
and handled by Env
- 10 was hard-coded for Catch, but the default of 1 is appropriate for Atari.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I set the default to 10 for Catch or leave this exposed for the user as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to set a default zoom for Catch - just have it set as part of the opts in experiment.lua
.
Seems like the merge of the branch replayed the commits on top... once I've finished with XOWorld I think I'll rebase the branch and open a new PR, is that alright? Also there is a small issue here, comma missing at the end :) EDIT: XOWorld is an awesome environment!! Should be fully implemented now and should wrap up everything. Just thinking about unit-tests over here to try make sure everything works correctly now! |
Thanks for spotting that - now fixed on Yes a rebase would clean this up, good job for suggesting :) There shouldn't be any more changes due now. |
Thanks, in terms of unit tests any suggestions on functionality to verify and check? |
Good question. I suppose you could test an environment by loading it, starting it, and doing at least one step. Not sure if there's anything that needs to be done if someone tries to I started thinking about testing Atari, but there's licensing issues with ROMs. I thought maybe there might be a homebrew ROM that we could include in the repo, but then it'd need to be coded into ALE... |
Feel free to use torch-twrl as a testing base if you want. It has some basic testing functionality built up here: https://github.com/twitter/torch-twrl/blob/master/test/test-gym.lua It runs a few tests for different environments, the runTest code is from the gym-http-api, you can see the full extent of it here: https://github.com/openai/gym-http-api/blob/master/binding-lua/test_api.lua It is a pretty verbose test that runs through the RL environment functionality. It is quite easy to find the ROMs if you are particularly motivated. |
Thanks for that, I can definitely try imitate those tests for all the rlenvs environments! EDIT: Also saw malmo added (which is awesomesauce!) I'll pull this in and get this implemented as well. |
@SeanNaren I'm trying to get Malmo working myself, so bear with me a while. New plan is to target |
OK that was a bit of a mess but I seem to have knocked down the requirements nicely and condensed the code. Had a quick look, but couldn't get |
Right I think that's minecraft handled (I need to still test that it works, will do once I'm home on my own machine) and now just the tests. Currently I just run through every environment but need to add some assertions here, however not sure what I could assert on. In the torch-twrl tests we assert on a successful run through (return true as long as everything runs correctly). |
Great work! You can do simple assertions on type of variables response, or an assertion ex: runTest could return an error code with a pcall, if the return is nice then On 17 November 2016 at 03:30, Sean Naren notifications@github.com wrote:
Kory Mathewson |
Added assertions, check them out! I also rebased the branch into one nice commit, check here. I think the changes are done here so it be great if you could have a quick look to let me know if anything needs changing! What do we plan to merge into? I can then open the PR on the rebased branch. |
Unfortunately I'll be away this weekend so can't look at it till next week, but the plan is to merge this into This way anyone who's still using the old API (e.g. the Atari repo) can still do so by requiring the |
Sounds like a plan and no worries, whenever you get time! |
Looks good! Firstly, can you merge the latest Minecraft stuff (both in the env and the readme)? Secondly, if you could use 2-space indentation for consistency that'd be awesome. I'll have another look after and then we should be able to merge |
Having issues merging master into the rebased branch. Can you see any commits from master not in the rebased branch? |
Ah sorry the latest work should be in |
This is continued in #14 |
Just opening this PR to track progress based on this PR. Need to test each environment with twrl to make sure they behave correctly, if you spot any issues or have any feedback let me know :)
It may also be nice to set the rewards similar to gym as well for comparison sake!