Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soccer refactor (release) #3331

Merged
merged 32 commits into from
Feb 5, 2020
Merged

Soccer refactor (release) #3331

merged 32 commits into from
Feb 5, 2020

Conversation

andrewcoh
Copy link
Contributor

@andrewcoh andrewcoh commented Jan 31, 2020

This PR refactors the soccer environment to be trained with the ghost trainer.
Major changes:

  • Removes roles of striker/goalie. Both are just Soccer agents
  • Branched action space
  • Increased drag on the ball
  • Freeze balls y coordinate to prevent bouncing out of agents observation space
  • Agents observation space is a set of raycasts at vertical start/end offset 0.5 with 5 per side at 120 degrees forward and a set of raycasts with 1 per side at 90 degrees backward.

@ervteng
Copy link
Contributor

ervteng commented Feb 3, 2020

Soccer .nn as-is isn't working - seems like there is a different number of vector obs in the NN vs. in the behavior parameters

@andrewcoh
Copy link
Contributor Author

Soccer .nn as-is isn't working - seems like there is a different number of vector obs in the NN vs. in the behavior parameters

@ervteng sorry, I should have told you. There were duplicate raycast observations so I removed them and am retraining right now.

if (team == Team.Purple)
base.InitializeAgent();
m_BP = gameObject.GetComponent<BehaviorParameters>();
if (m_BP.m_TeamID == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use explicitly assign these values [0, 1] to the Team enum and compare against that?

@andrewcoh andrewcoh merged commit d8ebf3b into master Feb 5, 2020
@delete-merged-branch delete-merged-branch bot deleted the develop-soccer-refactor branch February 5, 2020 20:48
@andrewcoh andrewcoh changed the title Soccer refactor Soccer refactor (release) Feb 5, 2020
@AcelisWeaven
Copy link
Contributor

Hi @andrewcoh ,

Sorry if this is not the right place to ask, but have you tried to train the Soccer env with self-play with SAC instead of PPO?

I've got a similar custom environment and with selfplay+PPO I get some okay-ish results, but with selfplay+SAC, the agents move kinda randomly.

Thanks,

@andrewcoh
Copy link
Contributor Author

Hi @AcelisWeaven,

SAC tends to do pretty poorly when rewards are sparse. I've trained both Tennis and Soccer with SAC+self-play and have only been able to get good policies when I shape the reward function.

Can you elaborate what you mean by okay-ish results?

@AcelisWeaven
Copy link
Contributor

AcelisWeaven commented Feb 7, 2020

Hi @andrewcoh ,

That makes sense, thanks! Maybe that could be mentioned in the docs somewhere?

Sorry for the lack of context about the okay-ish results. This is in comparison to a good agent that I trained using @LeSphax 's fork (based on ml-agents v0.8).
I actually forgot a PPO hyper-parameter from the trainer_config I used with the fork, so I'll let it train for a few days and I'll let you know if I've got any progress.

If you want to know a bit more about my environment, it looks like this: https://www.youtube.com/watch?v=MZdG-qUug1Q
Basically, a Soccer-like env with platformer controls and items. Each agent can see its nearest teammates and opponents (position, velocity, held/used items), goals positions, ball position (if any), and some raycasting is used for the walls/platforms.
A good agent should learn to: score, defend its goal, run after the ball (and anticipate its respawn), learn to use items, passing the ball.

Edit: Quick question, do you think the existential malus is still needed on Soccer with selfplay?

@andrewcoh
Copy link
Contributor Author

Ah yes I remember this. If you can't get it to work as well, please let me know so that we can figure out why. I think there are differences with my implementation and @LeSphax's. Is your observation space similar to Soccer's? Out of curiousity, which hyperparameter were you missing?

I don't think it is absolutely necessary but it does make for slightly more aggressive offense.

@AcelisWeaven
Copy link
Contributor

AcelisWeaven commented Feb 8, 2020

Soccer only uses raycasts right? In my case I've got raycasts (ball/walls/platforms) + players positions/velocities (normalized) + items informations (held by a player, used by a player, ...)
If there is not enough players, the observations are padded. Edit: everything is relative to the agent position.

I forgot that I changed beta and num_epoch, if I remember correctly it made the agent learn quicker. Another difference I've got is that I now use --num-envs 16, where I only used one before. (I trained my agents on a single core VPS :) )

@AcelisWeaven
Copy link
Contributor

AcelisWeaven commented Feb 9, 2020

Hey @andrewcoh ,

I think I found the issue. I'm a bit embarrassed I didn't try that before.
Replacing --num-envs 16 by --num-envs 1 solved the problem... (and training is actually much faster this way)

Here are some Tensorboard screenshots, the change was applied at about 5M steps.
(don't mind the Cumulative Reward, I changed the existential malus at some point)
image
image
image

Now, the agents play well!

Is this a bug or the intended behavior ?

@andrewcoh
Copy link
Contributor Author

That's actually quite strange that switching from 16 to 1 envs solved the problem. I will look into this. Thanks for bringing it to my attention/using the new self-play trainer!

@AcelisWeaven
Copy link
Contributor

Thanks @andrewcoh! Don't hesitate if you have any question, there's still a possibility that it may come from my custom environment somehow :)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants