-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doom - Same Action Space Across Environments #157
Doom - Same Action Space Across Environments #157
Conversation
I can't seem to find the ALTATTACK changes - can you direct me as to where to look? re: Doom action spaces: the original reason I changed to smaller action space sizes is that many of our existing RL algorithm implementations (TRPO, CEM, etc) fail badly when confronted with such a large discrete action space. It's possible to reduce the action space in agent code instead of in the env, but it does increase the barriers to getting started, and Doom is such a visually exciting environment that I'd like to bias in favor of making it as easy as possible to submit agents. I'm with you on the benefits of having a single state-action space for all of Doom. What do you think about adding this as another set of "full-action" Doom envs e.g. 'DoomTakeCoverFull-v0'? People would start out on the small action space and move to the full action space once they're happy with their algorithms. |
ALT_ATTACK is just the renumbering of other actions after it (e.g. MOVE_RIGHT is now index #10 in all action spaces). We can probably implement an hybrid solution where the user can either submit a list of available commands, or a list of all 41 commands. For instance, for doom-basic, users could send an action either of: action = [0, 1, 0] where the first parameter is ATTACK, the second MOVE_RIGHT, and the third MOVE_LEFT actions = [0] * 41
actions[0] = 0 # ATTACK
actions[10] = 1 # MOVE_RIGHT
actions[11] = 0 # MOVE_LEFT |
I was under the impression that ALTATTACK wasn't properly supported by VizDoom; what effect does adding it to controls.md have? Adding the ability to use both small and large action spaces doesn't fully solve the problem, because agents need to do some introspection of the action space in order to know how large of an action they should provide to step(). What do you think about the solution of registering a small and large action space environment for each Doom task? |
I thought ALTATTACK wasn't properly supported, but someone mentioned that it was a typo in VizDoom's source code, and the misspelled action is "ALATTACK". So everything is working fine with the misspelled word (in deathmatch.cfg). Not sure exactly what method you mean by agent introspection.
My issue with duplicating environments is that
|
Yes I can see where you are coming from. I'd like to make the case that Doom with the "small" action space is a different (and much easier) environment compared to Doom with the full action space. Take for instance a "small" action space with 3 key actions: it's possible to write an agent that enumerates all 2^3=8 possibilities and train a policy that maintains a probability distribution over those actions. For the full 41-dimensional action space, you can't even store a distribution over 2^41 possible actions in memory (it would be multiple terabyte). So you'd need to be clever about somehow factorizing your action space. So it's not appropriate to compare agents that are learning on the small vs. large action spaces; the large action space is much more difficult. Implementation-wise, I'm fine with having a single e.g. DoomDeathmatch class that supports both action spaces, but I'm pushing for a separate 'DoomDeathmatch-v0' and 'DoomDeathmatchFull-v0' as registered environments. Let me know if you find this compelling. |
The only thing is Deathmatch doesn't really have a "small" action space. All commands are enabled (except deltas), so it's not really possible to beat the level by just attacking and moving left and right. I'm assuming the only way to beat the level is to train an algorithm on all other levels with all commands, to get used to gameplay and enemy detection, and then run it on Deathmatch. For the other levels, the "full" action space doesn't really matter, because it will just be used as part of the training for Deathmatch. I'll modify sample() to return the small list, but I don't think there is a need to split any levels between "simple" and "full" |
|
Yes, deathmatch would not have a small action space. The use case of training on the simple environments and working your way up to deathmatch is a very interesting one. We've been talking about similar curriculum learning problems internally, and I'm not sure we have a great story for how to handle it in gym. One idea is to make a meta-doom environment which cycles through the different tasks from easiest to hardest (either based on number of episodes or reward). I'd be curious if you have concrete thoughts on how to approach this. Re: small vs large doom environments, after thinking about it I really don't want to be comparing agents trained on small and large action spaces as if it was the same environment. So either we have two different environments and two different action spaces, or we stick with one action space. I could be convinced that we should use the full action space instead of the small one - my main concern is that the full action space is too hard to make an interesting benchmark. One idea: if you can get a reinforcement learning algorithm to work on e.g. DoomTakeCover with the full action space and without doom specific tweaks I'd be more inclined to use the full space. |
So I'll just add a flag to the init and registration to specify small or large environment. For the meta-doom, here are a couple of points: (1) - VizDoom already has an order for the mission: 1-Basic, 2-Corridor, 3-DefendCenter, 4-DefendLine, 5-HealthGathering, 6-MyWayHome, 7-PredictPosition, 8-TakeCover, 9-Deathmatch |
Updated difficulty for some missions. Here are the stats I have: 1- Doom Basic 2- Corridor 3- DefendCenter 4- DefendLine 5- HealthGathering 6- MyWayHome 7- PredictPosition 8- TakeCover 9- Deathmatch |
I'm strongly in favor of using the full action space, which is fixed across doom environments. We shouldn't be too concerned with how our current algorithm implementations do when building the environments, but FWIW, I don't think the large action space will affect TRPO and CEM that much -- just slow them down by a small constant factor. @jietang, did you find that the number of actions made a big difference? |
MOVE_UP and MOVE_DOWN were in deathmatch.cfg, but not in controls.md. So the full action space now has 43 commands, which replicate all commands in VizDoom.
|
@ppaquette, the suggestions for meta-doom all look reasonable to me. I think we'll want to include something in the state indicating the current task if it's not already available. @joschu, definitely agreed that the transfer learning task is interesting. Do you have thoughts on ppaquette's proposal? Re: action space size, my thinking is was that it's nice to have diverse environments that can be solved with the same policy class (e.g. ffnn with softmax over discrete actions) to enable direct comparison of the learning algorithm itself (which small action spaces enable). Maybe that's unnecessary across domains - thoughts? Re: running trpo on doom, training on the small action space finished overnight on my (reasonably new) laptop. I haven't spent much time playing with the large action space (it requires some tweaking of the policy parameterization) but I can try it out if you're interested. |
@ppaquette Had a chance to catch up with joschu IRL. The conclusion was that we should use the large action space for all environments. Could you make the appropriate changes to this PR? (sorry about the churn from my end) Re: the meta-doom env, let's start a separate issue or PR to discuss it. One thing to think about: it's hard to tell whether it's set up correctly without an agent that is able to learn on it. So we might want to start small (e.g. with a small number of environments) and develop agent(s) in tandem. |
Remaining issues:
|
@@ -1,5 +1,9 @@ | |||
from gym.envs.registration import registry, register, make, spec | |||
|
|||
# To be able to create new-style properties | |||
class Env(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this for? Can we get rid of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a property for mode, but I removed it afterwards
I'll remove and resubmit.
Mind rebasing? Looks like there are now conflicts (likely in the scoreboard registration) |
…ntrols between environments).
…ull list of 41 commands
…her than empty list (which was triggering an error)
- Added 'normal', 'fast' and 'human' mode - Set non-deterministic to True - Set video.frames_per_second to 35 - Properly returning game variables
…s sporadically
Thanks @ppaquette |
This PR adds another commit to the previous PR.
VizDoom is a series of mission that build on each other.
This PR creates a standard action space of 41 commands (similar to a keyboard to the human player), that is the same between environments (e.g. first command is always ATTACK, second command is always JUMP, etc.) .
With this setup, it becomes possible to run an algorithm on all the Doom environments in sequence (which is likely required to beat the Deathmatch level).
The actions to be performed are submitted as a list of 41 integers, with 1 being active and 0 being inactive.
e.g.
actions = [0] * 41
actions[0] = 1 # ATTACK
actions[13] = 1 # MOVE_FORWARD
The first levels only allow certain commands, the disabled commands are ignored.
The full list of commands is in controls.md