-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soccer refactor (release) #3331
Conversation
Soccer .nn as-is isn't working - seems like there is a different number of vector obs in the NN vs. in the behavior parameters |
@ervteng sorry, I should have told you. There were duplicate raycast observations so I removed them and am retraining right now. |
if (team == Team.Purple) | ||
base.InitializeAgent(); | ||
m_BP = gameObject.GetComponent<BehaviorParameters>(); | ||
if (m_BP.m_TeamID == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use explicitly assign these values [0, 1] to the Team
enum and compare against that?
Hi @andrewcoh , Sorry if this is not the right place to ask, but have you tried to train the Soccer env with self-play with SAC instead of PPO? I've got a similar custom environment and with selfplay+PPO I get some okay-ish results, but with selfplay+SAC, the agents move kinda randomly. Thanks, |
Hi @AcelisWeaven, SAC tends to do pretty poorly when rewards are sparse. I've trained both Tennis and Soccer with SAC+self-play and have only been able to get good policies when I shape the reward function. Can you elaborate what you mean by okay-ish results? |
Hi @andrewcoh , That makes sense, thanks! Maybe that could be mentioned in the docs somewhere? Sorry for the lack of context about the okay-ish results. This is in comparison to a good agent that I trained using @LeSphax 's fork (based on ml-agents v0.8). If you want to know a bit more about my environment, it looks like this: https://www.youtube.com/watch?v=MZdG-qUug1Q Edit: Quick question, do you think the existential malus is still needed on Soccer with selfplay? |
Ah yes I remember this. If you can't get it to work as well, please let me know so that we can figure out why. I think there are differences with my implementation and @LeSphax's. Is your observation space similar to Soccer's? Out of curiousity, which hyperparameter were you missing? I don't think it is absolutely necessary but it does make for slightly more aggressive offense. |
Soccer only uses raycasts right? In my case I've got raycasts (ball/walls/platforms) + players positions/velocities (normalized) + items informations (held by a player, used by a player, ...) I forgot that I changed beta and num_epoch, if I remember correctly it made the agent learn quicker. Another difference I've got is that I now use |
Hey @andrewcoh , I think I found the issue. I'm a bit embarrassed I didn't try that before. Here are some Tensorboard screenshots, the change was applied at about 5M steps. Now, the agents play well! Is this a bug or the intended behavior ? |
That's actually quite strange that switching from 16 to 1 envs solved the problem. I will look into this. Thanks for bringing it to my attention/using the new self-play trainer! |
Thanks @andrewcoh! Don't hesitate if you have any question, there's still a possibility that it may come from my custom environment somehow :) |
This PR refactors the soccer environment to be trained with the ghost trainer.
Major changes: