-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard API for multi-agent environments #934
Comments
I think we should differentiate between actual real-time games (where there is actual, continous time) and games where multiple agents make moves in the same turn. The latter is just a special case of a turn-based multi-agent environment where agents take turns proposing a move and the environment updates jointly after all proposals are available (to the environment). I think it would also make implementation a lot simpler. You could append a "player=1" as optional input parameter to step, render, action_space (getter), observation_space (getter), [...] and have the environment worry about the logic behind it. From an agent's pov it would look a lot like the usual environments. The only problem I see is Regarding your proposal for multi-agent environments I think it should be up to the learning algorithm to decide which agent to give control over the current turn. Take self play as an example where there is no need to enforce which players turn it is as both are the same. Still the environment needs to track turns to keep up with potentially different observations or actions. The same line of thought goes for "multi-step" turn-based environments. It should be up to the algorithm to decide whom to give control over the current turn. If the environment features "multi-step" turns the algorithm has to take care of adhering to that (or not, depending on the setup). |
Good point, it would be great! Does gym have any environment with continuous time yet to take inspiration?
So every agent makes a blocking call to the step method? Because it must wait for every agent before getting an observation
Should we not try to abstract "environment mechanics" from the particular algorithm chosen? This is obviously my personal idea :) For example, in self-play, the environment should not know anything about the fact that we have the same agent playing itself, while it should know that there are N players that must take turns or can take actions jointly. |
I gave this a bit more thought and have to revise my idea. I no longer think it makes sense to have a
Currently, I would agree. The orchestrating code will have to worry about getting observations to each agent. Again, there is no real need to change the existing API since multidimensional spaces are already supported (e.g. a tuple-space) =).
The current API already supports that. The action space (and observation space) can change after every step, so using the "trick" of decomposing one step where every agent makes it's move to multiple steps where agents get to "propose" moves can accomplish unique observation and action spaces.
This forces people that want to use the API to write multi-threaded code. I think the costs will outweigh the benefit (but that is my opinion, feel free to disagree). Instead agents can split their "pick_action" and "update" parts (which many do anyway) and hand control back to the orchestrating code in between. That way each agent's "pick_action" can be executed and passed to the environment by calling
I don't know actually. RL generally has a problem with continuous time as it is based on MDPs, which assume you get to make a sequence of discrete choices. |
Yeah, I was kinda reflecting on this, in fact any digital device will "discretize" time, so no worries. About your "tricks", you're obviously right, there is almost nothing that cannot be done with the current standard environment (in fact, I already did in my env :) ). Since I've seen different repos of multi-agent environment that uses different and specific approaches, I was more interested in finding common "guidelines" for the creation of new multi-agent environments, in order to make them "consistent" with each other (I think the simple and standard interface of gym is its main strength in fact). |
If I understand correctly, it looks like the normal environment can be used for the multiagent case but breaks the requirements of the Env class for the reward. This could be solved by simply changing the requirements of the reward in the Env class to a generic object rather than strictly a float. Not sure how this change will percolate to code that works off of the assumption that the reward is a float --- thus, those dependencies would have to be modified. Alternatively, maybe a new Multiagent env class could be introduced. Also, it would be nice to have a few multiagent environments in this repository to demonstrate a Standard API for multiagent environments. |
@nicomon24, @FirefoxMetzger. What is the current status of this issue? Does gym support any multi-agent environment yet? |
AFAIK the situation is still the same, no common strategy on multi-agent environments inside gym, while there are more multi-agent envs outside gym which can be used as inspiration (this tool is very cool if you want to look at some, just filter the multi-agent envs) |
Other than multi-agent environment, I think that the general Env class could be expanded also considering under which condition the decision is taken: at the same time, turn based, turn and phase based.
|
+1 |
+1 I'm not an expert of the In a standard, single agent, environment we have something like this
In multi-agents environments, the order of these operations may be different and sometimes it will be hard to encapsulate all in the same method. In a chess-like environment you will have that the state after action 1 will not be the state from which action 2 will be executed (because another agent in playing in the middle)
Additionally, as in my specific case, the reward after a turn can only be computed after both agents have acted
|
This is a good discussion. I'm also interested in multi-agent environments. @alsora has identified one important issue:
E.g. in Tic-Tac-Toe, player 1 may get a terminal reward after player 2's move. I stumbled into this problem a few years ago when I implemented a tic-tac-toe game environment (as a learning exercise). I can't see how that could be achieved with the current Gym method:
If it's any help, the way I solved it in the end was to have three object types:
Here is how it worked:
With this structure in place, the GameController basically manages the communication of rewards so they can be passed to each agent immediately after they take an action or later after all agents made their move. Hope that helps. If you do implement multiple agent environments it would be good to allow for the idea of agents that can communicate with each other or share the same value function for example. |
I stumbled upon this discussion thread and wanted to share similar work: ma-gym Please, refer to the Usage Wiki for details on my compliance with the open-ai gym. I hope this might be useful to others. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I think this issue should remain open. It's a major lack in Gym's current API that will become only more acute over time with the renewed emphasis on multi-agent systems (OpenAI 5, AlphaStar, ...) in modern deep RL. |
I have a kinda philosophical question on this matter. Wandering across projects that use gym in MARL environments, I have noticed that people usually use python lists for e.g. action_n whereas rllib, that I use a lot, typically uses dictionaries (e.g. action_dict) for everything. What should be the proper way according to you ? For example, in multiagent gym environments I found, the return of step(): obs_n, rew_n, done_n, info_n would be python lists, whereas in rllib they would be dictionaries with one entry for each agent name. This easily allows for variable number of agents over time, heterogeneous episode lengths, heterogeneous observation/action spaces etc. Actually I have been asking this question to some OpenAI guys at NeurIPS but they didn't really know. |
I like that @yannbouteiller. Using an action dict like rllib's approach is sound [1] for arbitrary agents entering and exiting. Would love to hear other's who have more experience in using rllib's multi-agent environment to see if there is missing use-case from rllib's API? [1] https://ray.readthedocs.io/en/latest/rllib-env.html#multi-agent-and-hierarchical |
I've been thinking about this for a while as I've had multiple projects where having a standard multi-agent API would be great.
|
All the RL algorithms I know assume a time-step, even in the real time setting, and many MARL algorithms use the global information during training with a global time-step (e.g. centralized critics). I have been thinking about this as well but I don't really see the point of a client-server asynchronous architecture over the current synchronous way of using step() considering these facts? |
You're leaving it up to the person writing the agents to be calling step at periodic intervals. Ideally you would want this to be handled by the environment otherwise sharing gyms would also require the user to write new code each time. Perhaps a client-server might be overkill but it naively seems quite intuitive/user friendly. |
Allowing between-env-step inter-agent communication (if, indeed, this is a valid MARL approach) is a possible use-case for asynchronous interactions and multi-node, distributed arrangements. I've tried this with SPADE but have been having to figure out some scale and memory issues. I'd tried some of the gym environment approaches mentioned above (thanks, contributors), but with a single coordinator game-master or tournament-master agent (context is games in this case). This is how I've managed to allow for the inter-agent interactions between steps. Has anyone found a better solution for allowing distributed agents to interact with a central multi-agent gym environment, while allowing for such async interactons between steps? |
Check out https://github.com/PettingZoo-Team/PettingZoo. It's linked from the readme now too. |
Hi everyone,
I'm developing a multi-agent env (multi-snake, latest Request for Research) and I thought that having a common API interface for multi-agent environments would be great. Here some of my thoughts:
I'm currently working on this, so I wanted to discuss here if any idea comes up :)
The text was updated successfully, but these errors were encountered: