Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for multiple dimension continuous action spaces #177

Open
wants to merge 1 commit into
base: continuous
Choose a base branch
from

Conversation

devin-m-NRL
Copy link

Four themes to changes

  • prediction_policy_network output is 2*action space, one mean and standard deviation for each joint. Log_prob is summed after being calculated for each joint
  • dynamics_encoded_state_network function now takes into account an action array
  • Functions that now need to work for arrays: Np.random.choice, item, and dictionary
  • changes for tensorboard to save video renders

@devin-m-NRL
Copy link
Author

devin-m-NRL commented Oct 1, 2021

Results: Sawyer shelf environment I added had reward of -43 which is not great but performs okay. It trained with one gpu for 110,000 training steps and 55,000 self play games over 10 days.

image

shelfMuZero3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant