Some details in mineagent RL implementation #4

YHQpkueecs · 2022-08-26T10:19:46Z

Hello! I am reproducing your paper results (train PPO+self-imitation, with MineCLIP reward), but fail to fill some missing details:

How to implement the agent's 89 discrete actions as said in paper? Currently your MineAgent uses multi-discrete output 3*3*4*25*25*3, which is much larger. Did you remove some action choices？
For computing DIRECT reward using the MineCLIP model, how to sample the negative texts and how many did you sample?
I find the timescale of 1 step in MineDojo simulation is much smaller than 1 second in Youtube videos. Did you use the last consecutive 16 rgb observations to compute reward?

Thank you!

YHQpkueecs · 2022-08-26T10:24:53Z

By the way, do you plan to release the training code, or the learned agent parameters?

rsha256 · 2022-10-26T22:02:22Z

@YHQpkueecs Were you able to get the learned agent parameters from @LinxiFan @wangguanzhi or @yunfanjiang

mansicer mentioned this issue Apr 2, 2023

Training details about MineAgent #9

Open

Provide feedback