Soccer refactor (release) #3331

andrewcoh · 2020-01-31T21:20:38Z

This PR refactors the soccer environment to be trained with the ghost trainer.
Major changes:

Removes roles of striker/goalie. Both are just Soccer agents
Branched action space
Increased drag on the ball
Freeze balls y coordinate to prevent bouncing out of agents observation space
Agents observation space is a set of raycasts at vertical start/end offset 0.5 with 5 per side at 120 degrees forward and a set of raycasts with 1 per side at 90 degrees backward.

config/trainer_config.yaml

ervteng · 2020-02-03T20:46:58Z

Soccer .nn as-is isn't working - seems like there is a different number of vector obs in the NN vs. in the behavior parameters

andrewcoh · 2020-02-03T22:06:37Z

Soccer .nn as-is isn't working - seems like there is a different number of vector obs in the NN vs. in the behavior parameters

@ervteng sorry, I should have told you. There were duplicate raycast observations so I removed them and am retraining right now.

surfnerd · 2020-02-05T19:23:47Z

Project/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs

-        if (team == Team.Purple)
+        base.InitializeAgent();
+        m_BP = gameObject.GetComponent<BehaviorParameters>();
+        if (m_BP.m_TeamID == 0)


nit: could use explicitly assign these values [0, 1] to the Team enum and compare against that?

AcelisWeaven · 2020-02-07T18:33:06Z

Hi @andrewcoh ,

Sorry if this is not the right place to ask, but have you tried to train the Soccer env with self-play with SAC instead of PPO?

I've got a similar custom environment and with selfplay+PPO I get some okay-ish results, but with selfplay+SAC, the agents move kinda randomly.

Thanks,

andrewcoh · 2020-02-07T19:36:11Z

Hi @AcelisWeaven,

SAC tends to do pretty poorly when rewards are sparse. I've trained both Tennis and Soccer with SAC+self-play and have only been able to get good policies when I shape the reward function.

Can you elaborate what you mean by okay-ish results?

AcelisWeaven · 2020-02-07T21:41:45Z

Hi @andrewcoh ,

That makes sense, thanks! Maybe that could be mentioned in the docs somewhere?

Sorry for the lack of context about the okay-ish results. This is in comparison to a good agent that I trained using @LeSphax 's fork (based on ml-agents v0.8).
I actually forgot a PPO hyper-parameter from the trainer_config I used with the fork, so I'll let it train for a few days and I'll let you know if I've got any progress.

If you want to know a bit more about my environment, it looks like this: https://www.youtube.com/watch?v=MZdG-qUug1Q
Basically, a Soccer-like env with platformer controls and items. Each agent can see its nearest teammates and opponents (position, velocity, held/used items), goals positions, ball position (if any), and some raycasting is used for the walls/platforms.
A good agent should learn to: score, defend its goal, run after the ball (and anticipate its respawn), learn to use items, passing the ball.

Edit: Quick question, do you think the existential malus is still needed on Soccer with selfplay?

andrewcoh · 2020-02-07T23:15:54Z

Ah yes I remember this. If you can't get it to work as well, please let me know so that we can figure out why. I think there are differences with my implementation and @LeSphax's. Is your observation space similar to Soccer's? Out of curiousity, which hyperparameter were you missing?

I don't think it is absolutely necessary but it does make for slightly more aggressive offense.

AcelisWeaven · 2020-02-08T08:04:50Z

Soccer only uses raycasts right? In my case I've got raycasts (ball/walls/platforms) + players positions/velocities (normalized) + items informations (held by a player, used by a player, ...)
If there is not enough players, the observations are padded. Edit: everything is relative to the agent position.

I forgot that I changed beta and num_epoch, if I remember correctly it made the agent learn quicker. Another difference I've got is that I now use --num-envs 16, where I only used one before. (I trained my agents on a single core VPS :) )

AcelisWeaven · 2020-02-09T01:47:52Z

Hey @andrewcoh ,

I think I found the issue. I'm a bit embarrassed I didn't try that before.
Replacing --num-envs 16 by --num-envs 1 solved the problem... (and training is actually much faster this way)

Here are some Tensorboard screenshots, the change was applied at about 5M steps.
(don't mind the Cumulative Reward, I changed the existential malus at some point)

Now, the agents play well!

Is this a bug or the intended behavior ?

andrewcoh · 2020-02-10T17:15:51Z

That's actually quite strange that switching from 16 to 1 envs solved the problem. I will look into this. Thanks for bringing it to my attention/using the new self-play trainer!

AcelisWeaven · 2020-02-10T22:06:41Z

Thanks @andrewcoh! Don't hesitate if you have any question, there's still a possibility that it may come from my custom environment somehow :)

andrewcoh added 16 commits January 29, 2020 10:07

removed roles/modified reward soccer

b59fec2

updated config

d082707

observation stacking in soccer

3dc2ef5

Merge branch 'master' into develop-soccer-refactor

d634e39

removing net rule from tennis

97637c5

removed intermediate reward soccer

7193f93

branched actions soccer

301e1b6

add timestep penalty tennis

99fd879

increased drag soccer ball/soccer brain

8f89b9c

cleaned commented code tennis

f6bc5b1

Merge branch 'master' into develop-soccer-refactor

2453e3b

added heuristic and more obs raycast to soccer

7f407a6

testing tennis 0.0 prob play self

c907b0b

new soccer policy

cc47674

revert changes to Tennis

d492497

Merge branch 'master' into develop-soccer-refactor

b4066aa

andrewcoh requested review from surfnerd, chriselion and ervteng January 31, 2020 21:20

andrewcoh added 5 commits January 31, 2020 13:21

adding policy meta file

73dddf5

TFModels meta

b8cd85e

removed collision layers that are no longer used

f561488

reduced kick power

755a46f

increasing kickpower and drag

ed0bbe6

ervteng reviewed Feb 3, 2020

View reviewed changes

config/trainer_config.yaml Show resolved Hide resolved

andrewcoh added 3 commits February 3, 2020 16:24

running soccer 50 million timesteps

f9b3ed7

Merge branch 'master' into develop-soccer-refactor

18b9391

reduced lateral speed

9724bff

andrewcoh added 7 commits February 4, 2020 09:36

added ray perception backwards

5b99aaa

Merge branch 'master' into develop-soccer-refactor

fa54ddf

updated soccer brain

193c4f7

fixed learning environment docs and updated soccer image

0e2b785

new tennis image

1761aa1

x offset

24a9117

new Soccer brain

20882fa

surfnerd reviewed Feb 5, 2020

View reviewed changes

surfnerd approved these changes Feb 5, 2020

View reviewed changes

compare against enum

cc726b5

andrewcoh merged commit d8ebf3b into master Feb 5, 2020

delete-merged-branch bot deleted the develop-soccer-refactor branch February 5, 2020 20:48

andrewcoh changed the title ~~Soccer refactor~~ Soccer refactor (release) Feb 5, 2020

github-actions bot locked as resolved and limited conversation to collaborators May 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soccer refactor (release) #3331

Soccer refactor (release) #3331

andrewcoh commented Jan 31, 2020 •

edited

Loading

ervteng commented Feb 3, 2020

andrewcoh commented Feb 3, 2020

surfnerd Feb 5, 2020

AcelisWeaven commented Feb 7, 2020

andrewcoh commented Feb 7, 2020

AcelisWeaven commented Feb 7, 2020 •

edited

Loading

andrewcoh commented Feb 7, 2020

AcelisWeaven commented Feb 8, 2020 •

edited

Loading

AcelisWeaven commented Feb 9, 2020 •

edited

Loading

andrewcoh commented Feb 10, 2020

AcelisWeaven commented Feb 10, 2020

Soccer refactor (release) #3331

Soccer refactor (release) #3331

Conversation

andrewcoh commented Jan 31, 2020 • edited Loading

ervteng commented Feb 3, 2020

andrewcoh commented Feb 3, 2020

surfnerd Feb 5, 2020

Choose a reason for hiding this comment

AcelisWeaven commented Feb 7, 2020

andrewcoh commented Feb 7, 2020

AcelisWeaven commented Feb 7, 2020 • edited Loading

andrewcoh commented Feb 7, 2020

AcelisWeaven commented Feb 8, 2020 • edited Loading

AcelisWeaven commented Feb 9, 2020 • edited Loading

andrewcoh commented Feb 10, 2020

AcelisWeaven commented Feb 10, 2020

andrewcoh commented Jan 31, 2020 •

edited

Loading

AcelisWeaven commented Feb 7, 2020 •

edited

Loading

AcelisWeaven commented Feb 8, 2020 •

edited

Loading

AcelisWeaven commented Feb 9, 2020 •

edited

Loading