-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Update v2-staging from main (March 15) #5123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Make buffer type-agnostic * Edit types of Apped method * Change comment * Collaborative walljump * Make collab env harder * Add group ID * Add collab obs to trajectory * Fix bug; add critic_obs to buffer * Set group ids for some envs * Pretty broken * Less broken PPO * Update SAC, fix PPO batching * Fix SAC interrupted condition and typing * Fix SAC interrupted again * Remove erroneous file * Fix multiple obs * Update curiosity reward provider * Update GAIL and BC * Multi-input network * Some minor tweaks but still broken * Get next critic observations into value estimate * Temporarily disable exporting * Use Vince's ONNX export code * Cleanup * Add walljump collab YAML * Lower max height * Update prefab * Update prefab * Collaborative Hallway * Set num teammates to 2 * Add config and group ids to HallwayCollab * Fix bug with hallway collab * Edits to HallwayCollab * Update onnx file meta * Make the env easier * Remove prints * Make Collab env harder * Fix group ID * Add cc to ghost trainer * Add comment to ghost trainer * Revert "Add comment to ghost trainer" This reverts commit 292b6ce. * Actually add comment to ghosttrainer * Scale size of CC network * Scale value network based on num agents * Add 3rd symbol to hallway collab * Make comms one-hot * Fix S tag * Additional changes * Some more fixes * Self-attention Centralized Critic * separate entity encoder and RSA * clean up args in mha * more cleanups * fixed tests * entity embeddings work with no max Integrate into CC * remove group id * very rough sketch for TeamManager interface * One layer for entity embed * Use 4 heads * add defaults to linear encoder, initialize ent encoders * add team manager id to proto * team manager for hallway * add manager to hallway * send and process team manager id * remove print * small cleanup * default behavior for baseTeamManager * add back statsrecorder * update * Team manager prototype (#4850) * remove group id * very rough sketch for TeamManager interface * add team manager id to proto * team manager for hallway * add manager to hallway * send and process team manager id * remove print * small cleanup Co-authored-by: Chris Elion <chris.elion@unity3d.com> * Remove statsrecorder * Fix AgentProcessor for TeamManager Should work for variable decision frequencies (untested) * team manager * New buffer layout, TeamObsUtil, pad dead agents * Use NaNs to get masks for attention * Add team reward to buffer * Try subtract marginalized value * Add Q function with attention * Some more progress - still broken * use singular entity embedding (#4873) * I think it's running * Actions added but untested * Fix issue with team_actions * Add next action and next team obs * separate forward into q_net and baseline * might be right * forcing this to work * buffer error * COMAA runs * add lambda return and target network * no target net * remove normalize advantages * add target network back * value estimator * update coma config * add target net * no target, increase lambda * remove prints * cloud config * use v return * use target net * adding zombie to coma2 brnch * add callbacks * cloud run with coma2 of held out zombie test env * target of baseline is returns_v * remove target update * Add team dones * ntegrate teammate dones * add value clipping * try again on cloud * clipping values and updated zombie * update configs * remove value head clipping * update zombie config * Add trust region to COMA updates * Remove Q-net for perf * Weight decay, regularizaton loss * Use same network * add base team manager * Remove reg loss, still stable * Black format * add team reward field to agent and proto * set team reward * add maxstep to teammanager and hook to academy * check agent by agent.enabled * remove manager from academy when dispose * move manager * put team reward in decision steps * use 0 as default manager id * fix setTeamReward Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com> * change method name to GetRegisteredAgents * address comments * Revert C# env changes * Remove a bunch of stuff from envs * Remove a bunch of extra files * Remove changes from base-teammanager * Remove remaining files * Remove some unneeded changes * Make buffer typing neater * AgentProcessor fixes * Back out trainer changes * use delegate to avoid agent-manager cyclic reference * put team reward in decision steps * fix unregister agents * add teamreward to decision step * typo * unregister on disabled * remove OnTeamEpisodeBegin * change name TeamManager to MultiAgentGroup * more team -> group * fix tests * fix tests * Use attention tests from master * Revert "Use attention tests from master" This reverts commit 78e052b. * Use attention from master * Renaming fest * Use NamedTuples instead of attrs classes * Bug fixes * remove GroupMaxStep * add some doc * Fix mock brain * np float32 fixes * more renaming * Test for team obs in agentprocessor * Test for group and add team reward * doc improve Co-authored-by: Ervin T. <ervin@unity3d.com> * store registered agents in set * remove unused step counts * Global group ids * Fix Trajectory test * Remove duplicated files * Add team methods to AgentAction * Buffer fixes (cherry picked from commit 2c03d2b) * Add test for GroupObs * Change AgentAction back to 0 pad and add tests * Addressed some comments * Address some comments * Add more comments * Rename internal function * Move padding method to AgentBufferField * Fix slicing typing and string printing in AgentBufferField * Fix to-flat and add tests * Rename GroupmateStatus to AgentStatus * Update comments * Added GroupId, GlobalGroupId, GlobalAgentId types * Update comment * Make some agent processor properties internal * Rename add_group_status * Rename store_group_status, fix test * Rename clear_group_obs Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com> Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com> Co-authored-by: Chris Elion <chris.elion@unity3d.com> Co-authored-by: andrewcoh <54679309+andrewcoh@users.noreply.github.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
* Removing some scenes, All the Static and all the non variable speed environments. Also removed Bouncer, PushBlock, WallJump and reacher. Removed a bunch of visual environements as well. Removed 3DBallHard and FoodCollector (kept Visual and Grid FoodCollector) * readding 3DBallHard * readding pushblock and walljump * Removing tennis * removing mentions of removed environments * removing unused images * Renaming Crawler demos * renaming some demo files * removing and modifying some config files * new examples image? * removing Bouncer from build list * replacing the Bouncer environment with Match3 for llapi tests * Typo in yamato test
* Fix typo * Add test
* Fix padding for List entries in buffer * Revert to coonverting to np.array * Fix dtype in PPO trainer
* Detach memory before storing * Add test * Evaluate with no_grad
…-main Release 14 branch to main
* Fix save/restore critic, add test * Rename module for PPO * Use correct policy in test
…json files in our examples. (#5077)
…back (#5091) Co-authored-by: Chris Elion <chris.elion@unity3d.com>
Co-authored-by: Ervin Teng <ervin@unity3d.com> Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com> Co-authored-by: Chris Elion <chris.elion@unity3d.com> Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com>
* Add pushblock collab * Make SimpleMultiAgentGroup public * Remove GoalDetectTrigger * Remove GDT meta file * Remove some comments * Add training configuration * Rename behavior * Add to docs * Change the reward structure in docs * Add back GoalDetectTrigger Co-authored-by: HH <brandonh@unity3d.com>
* Add multiAgentGroup capabilities flag * Add proto * Fix compiler error * Add warning for multiagent group * Add comment * Fix spelling mistake
* use get step to determine curriculum * add to CHANGELOG * Make step in trainer private (#5099) Co-authored-by: Ervin T <ervin@unity3d.com>
Increment versions after release 15 branch split
Co-authored-by: Chris Elion <chris.elion@unity3d.com>
…5113) * Fix end episode for POCA, add warning for group reward if not POCA * Add missing imports
vincentpierre
approved these changes
Mar 16, 2021
surfnerd
pushed a commit
that referenced
this pull request
Mar 18, 2021
surfnerd
pushed a commit
that referenced
this pull request
Mar 18, 2021
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed change(s)
Merge main into v2-staging. This still needs a conflict fix from @vincentpierre.