Rewards tracking for multi-agent #128

berttggg · 2023-09-04T01:00:28Z

berttggg
Sep 4, 2023

I am reading the code of rewards tracking for multi-agents.

And it seems that we are only interested in the first key's value, meaning the first agent only. Why? Or Anything I have missed?
Thank you.

Line 305 in 7b090a1

_rewards = next(iter(rewards.values()))

Toni-SM · 2023-10-09T11:42:30Z

Sorry for late response.

Yes, the current implementation uses only the reward from the first agent.

In the next releases this implementation will be extended to a more generic form.

0 replies

berttggg · 2023-10-10T07:36:16Z

I see, may I ask so far, the MAPPO or IPPO has any successful examples? How is the learning performance for the Bi-DexHands task?

0 replies