-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO ValueError: need at least one array to concatenate #19
Comments
From briefly reading the docs (I'm setting up my own env as well) I believe the issue may be lack of floats that it wants to pipe into numpy from your agent to keep track of 'memories' and such. Can you post your code? |
I'm not totally sure I understand the MemorySize argument in the Brain, do you define it in the same way you do with the State and Actions ? If so where? I had it set to the same number of states + actions I have but I just set it to zero and I still got the same error. this is my agent code right now, it's a bunch of rigid bodies and hinge joints |
I want to say there is a reason they dont iterate through acts in these demo builds. Does it work this way? Your code now, sets four actions within the same step. Whereas in the demos, they have if statements for the current steps as static array points i.e act[1]{movement} act[2]{movement} If it works with your type of array then great, I'd generally like to know :D |
if you look at the Ball3DAgent code they've got 2 total actions and are performing act[0] and act[1] as action_z action_x every step my code exerts a force on all 24 rigid bodies using all 24 actions every step, it's a little unclear but there's 6 legs with 4 rigid bodies, this logic I was using should access them all one by one as if I was manually typing out
|
Hi @sterlingcrispin, As a debug step, I would suggest walking through the The error in |
@awjuliani the Basic.ipynb looks okay, one thing I'm noticing is that I might be punishing my agent too much? Im only subtracting -0.01 each step it doesn't get closer
I've tried setting the agent's max step to 100 (the default) and Academy to 0 and 1000 My agent currently doesn't have a condition where done = true and a negative reward is given, could that be causing it? |
What is typically done for locomotion tasks like this is actually not to give a negative reward, but to give a positive reward for any progress the agent makes. So the reward function might look something like: +0.01 for every 1/10th of a meter it makes in forward-progress. If you expected it to walk upright, you might also provide a -0.1 reward for when it falls over, and end the episode ( That definitely isn't intuitive though, and I will go through and add this to a "best practices" section of the wiki. |
Just realized what you issue is @sterlingcrispin. You are using a camera as an observation with your agent. The model being created by PPO is one that expects to take the observations, not the states, thus the empty
I am definitely open to adding more to accommodate additional configurations, but the complexity/experimental nature of the network increases when trying to do things like combine visual input with states. As a quick fix, I'd recommend taking away the camera and seeing what you can learn just from the state inputs. If that isn't enough, you can augment the state with information about what is in front of you with Raycasts. |
@awjuliani okay I must have overlooked that somehow, seeing it written out like that makes it a lot clearer. I think Continuous control + camera observation would be really cool , I took away the camera and it seems to be training okay except for errors on my part with a few NAN rewards when the agent fell over but the fall wasn't detected thank you! |
Glad it is working out! For now, I will add an error when trying to use PPO with an unsupported agent configurations. Over time I'd like to add more network architectures to fit whatever type of agent is built, but that will take some experimentation, as things like continuous control + camera input is actually still an active area of research. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I'm trying to train my own agent in an environment and getting this error , not sure what I'm misconfiguring in my scene for this to happen
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
42 if len(trainer.training_buffer['actions']) > buffer_size and train_model:
43 # Perform gradient descent with experience buffer
---> 44 trainer.update_model(batch_size, num_epoch)
45 if steps % summary_freq == 0 and steps != 0 and train_model:
46 # Write training statistics to tensorboard.
/Users/sterlingcrispin/code/Unity-ML/python/ppo/trainer.pyc in update_model(self, batch_size, num_epoch)
139 if self.is_continuous:
140 feed_dict[self.model.epsilon] = np.vstack(training_buffer['epsilons'][start:end])
--> 141 feed_dict[self.model.state_in] = np.vstack(training_buffer['states'][start:end])
142 else:
143 feed_dict[self.model.action_holder] = np.hstack(training_buffer['actions'][start:end])
/Users/sterlingcrispin/anaconda/lib/python2.7/site-packages/numpy/core/shape_base.pyc in vstack(tup)
235
236 """
--> 237 return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
238
239 def hstack(tup):
ValueError: need at least one array to concatenate`
The text was updated successfully, but these errors were encountered: