Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO ValueError: need at least one array to concatenate #19

Closed
sterlingcrispin opened this issue Sep 21, 2017 · 12 comments
Closed

PPO ValueError: need at least one array to concatenate #19

sterlingcrispin opened this issue Sep 21, 2017 · 12 comments
Assignees
Labels
help-wanted Issue contains request for help or information.

Comments

@sterlingcrispin
Copy link
Contributor

I'm trying to train my own agent in an environment and getting this error , not sure what I'm misconfiguring in my scene for this to happen

`---------------------------------------------------------------------------
ValueError
Traceback (most recent call last)
in ()
42 if len(trainer.training_buffer['actions']) > buffer_size and train_model:
43 # Perform gradient descent with experience buffer
---> 44 trainer.update_model(batch_size, num_epoch)
45 if steps % summary_freq == 0 and steps != 0 and train_model:
46 # Write training statistics to tensorboard.

/Users/sterlingcrispin/code/Unity-ML/python/ppo/trainer.pyc in update_model(self, batch_size, num_epoch)
139 if self.is_continuous:
140 feed_dict[self.model.epsilon] = np.vstack(training_buffer['epsilons'][start:end])
--> 141 feed_dict[self.model.state_in] = np.vstack(training_buffer['states'][start:end])
142 else:
143 feed_dict[self.model.action_holder] = np.hstack(training_buffer['actions'][start:end])

/Users/sterlingcrispin/anaconda/lib/python2.7/site-packages/numpy/core/shape_base.pyc in vstack(tup)
235
236 """
--> 237 return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
238
239 def hstack(tup):

ValueError: need at least one array to concatenate`

@eagleEggs
Copy link

From briefly reading the docs (I'm setting up my own env as well) I believe the issue may be lack of floats that it wants to pipe into numpy from your agent to keep track of 'memories' and such. Can you post your code?

@sterlingcrispin
Copy link
Contributor Author

I'm not totally sure I understand the MemorySize argument in the Brain, do you define it in the same way you do with the State and Actions ? If so where? I had it set to the same number of states + actions I have but I just set it to zero and I still got the same error.

this is my agent code right now, it's a bunch of rigid bodies and hinge joints

JointAgent.txt
screen shot 2017-09-21 at 8 08 43 am

@eagleEggs
Copy link

eagleEggs commented Sep 21, 2017

I want to say there is a reason they dont iterate through acts in these demo builds. Does it work this way? Your code now, sets four actions within the same step. Whereas in the demos, they have if statements for the current steps as static array points i.e act[1]{movement} act[2]{movement}

If it works with your type of array then great, I'd generally like to know :D

@romerocesar romerocesar added help-wanted Issue contains request for help or information. python labels Sep 21, 2017
@sterlingcrispin
Copy link
Contributor Author

sterlingcrispin commented Sep 21, 2017

if you look at the Ball3DAgent code they've got 2 total actions and are performing act[0] and act[1] as action_z action_x every step

my code exerts a force on all 24 rigid bodies using all 24 actions every step, it's a little unclear but there's 6 legs with 4 rigid bodies, this logic I was using should access them all one by one as if I was manually typing out

act[i * legs[i].rbcount + 0]
act[i * legs[i].rbcount + 1]
act[i * legs[i].rbcount + 2]
act[i * legs[i].rbcount + 3]

@awjuliani
Copy link
Contributor

Hi @sterlingcrispin,

As a debug step, I would suggest walking through the Basic.ipynb notebook using your new environment. You can use it to examine the state and action space in an interactive manner, which might give you a better intuition into how your states are being represented.

The error in ppo.py you are receiving seems to correspond to the training_buffer being empty. Can I ask what you've set your Academy or Agent's Max Steps to?

@sterlingcrispin
Copy link
Contributor Author

sterlingcrispin commented Sep 21, 2017

@awjuliani the Basic.ipynb looks okay, one thing I'm noticing is that I might be punishing my agent too much? Im only subtracting -0.01 each step it doesn't get closer

Total reward this episode: 49.17
Total reward this episode: -10.01
Total reward this episode: -10.01
Total reward this episode: 43.01
Total reward this episode: -10.01
Total reward this episode: -10.01
Total reward this episode: -10.01
Total reward this episode: 0.88
Total reward this episode: -10.01
Total reward this episode: -10.01

I've tried setting the agent's max step to 100 (the default) and Academy to 0 and 1000
then tried agent max step to 1000 and Academy at 0

My agent currently doesn't have a condition where done = true and a negative reward is given, could that be causing it?

@awjuliani
Copy link
Contributor

What is typically done for locomotion tasks like this is actually not to give a negative reward, but to give a positive reward for any progress the agent makes. So the reward function might look something like: +0.01 for every 1/10th of a meter it makes in forward-progress. If you expected it to walk upright, you might also provide a -0.1 reward for when it falls over, and end the episode (done = true) then. For some inspiration, I would recommend having a look through this recent DeepMind paper where they train agents to walk (and run, and jump): https://arxiv.org/abs/1707.02286.

That definitely isn't intuitive though, and I will go through and add this to a "best practices" section of the wiki.

@sterlingcrispin
Copy link
Contributor Author

I've made some improvements on the rewards but that error about the training buffer still being empty remains

do I need to change some of the Hyperparameters for the PPO or are my unity inspector settings misconfigured?

screen shot 2017-09-21 at 11 09 37 am

screen shot 2017-09-21 at 11 09 24 am

screen shot 2017-09-21 at 11 09 05 am

@awjuliani
Copy link
Contributor

Just realized what you issue is @sterlingcrispin. You are using a camera as an observation with your agent. The model being created by PPO is one that expects to take the observations, not the states, thus the empty states array when training. There are currently three model configurations included with ppo:

  • Continuous control + states
  • Discrete control + states
  • Discrete control + camera observation

I am definitely open to adding more to accommodate additional configurations, but the complexity/experimental nature of the network increases when trying to do things like combine visual input with states. As a quick fix, I'd recommend taking away the camera and seeing what you can learn just from the state inputs. If that isn't enough, you can augment the state with information about what is in front of you with Raycasts.

@sterlingcrispin
Copy link
Contributor Author

@awjuliani okay I must have overlooked that somehow, seeing it written out like that makes it a lot clearer.

I think Continuous control + camera observation would be really cool ,

I took away the camera and it seems to be training okay except for errors on my part with a few NAN rewards when the agent fell over but the fall wasn't detected

thank you!

@awjuliani
Copy link
Contributor

Glad it is working out!

For now, I will add an error when trying to use PPO with an unsupported agent configurations. Over time I'd like to add more network architectures to fit whatever type of agent is built, but that will take some experimentation, as things like continuous control + camera input is actually still an active area of research.

@awjuliani awjuliani reopened this Sep 21, 2017
@lock
Copy link

lock bot commented Jan 5, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help-wanted Issue contains request for help or information.
Projects
None yet
Development

No branches or pull requests

4 participants