You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I can see, this would lead to duplicates in the state input to the agent's critic function. If there are components of the environment-state which are part of every agent's observation, these components would be contained the critic's input multiple times.
Is this true, or do I miss anything?
Does this (artificial) state expansion have any adverse effects on the critic, or can we safely assume, that the critic will learn quite fast, that the input values at some input nodes are always identical and hence can be treated commonly?
Are there any memory issues due to the multiple storage of the state components in each of the agents' replay buffer? (Probably, memory is not an issue with RL guys, but I have a background in embedded systems)
I would be very grateful for some more insight on this topic.
Regards,
Felix
The text was updated successfully, but these errors were encountered:
Hello everybody!
As far as I can see from the code, each agent maintains its own replay buffer.
In the training step, when sampling the minibatch, the observations of all agents are collected and concatenated.
maddpg/maddpg/trainer/maddpg.py
Lines 173 to 177 in fbba5e4
As far as I can see, this would lead to duplicates in the state input to the agent's critic function. If there are components of the environment-state which are part of every agent's observation, these components would be contained the critic's input multiple times.
Is this true, or do I miss anything?
Does this (artificial) state expansion have any adverse effects on the critic, or can we safely assume, that the critic will learn quite fast, that the input values at some input nodes are always identical and hence can be treated commonly?
Are there any memory issues due to the multiple storage of the state components in each of the agents' replay buffer? (Probably, memory is not an issue with RL guys, but I have a background in embedded systems)
I would be very grateful for some more insight on this topic.
Regards,
Felix
The text was updated successfully, but these errors were encountered: