Replies: 2 comments
-
Hi @berttggg Sorry for late response. Yes, the current implementation uses only the reward from the first agent. It was implemented using the Bi-DexHands environments for testing. Those environments define only one reward for all the task like Isaac Gym: In the next releases this implementation will be extended to a more generic form. |
Beta Was this translation helpful? Give feedback.
-
I see, may I ask so far, the MAPPO or IPPO has any successful examples? How is the learning performance for the Bi-DexHands task? |
Beta Was this translation helpful? Give feedback.
-
I am reading the code of rewards tracking for multi-agents.
And it seems that we are only interested in the first key's value, meaning the first agent only. Why? Or Anything I have missed?
Thank you.
skrl/skrl/multi_agents/torch/base.py
Line 305 in 7b090a1
Beta Was this translation helpful? Give feedback.
All reactions