Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some details in mineagent RL implementation #4

Open
YHQpkueecs opened this issue Aug 26, 2022 · 2 comments
Open

Some details in mineagent RL implementation #4

YHQpkueecs opened this issue Aug 26, 2022 · 2 comments

Comments

@YHQpkueecs
Copy link

YHQpkueecs commented Aug 26, 2022

Hello! I am reproducing your paper results (train PPO+self-imitation, with MineCLIP reward), but fail to fill some missing details:

  1. How to implement the agent's 89 discrete actions as said in paper? Currently your MineAgent uses multi-discrete output 3*3*4*25*25*3, which is much larger. Did you remove some action choices?
  2. For computing DIRECT reward using the MineCLIP model, how to sample the negative texts and how many did you sample?
  3. I find the timescale of 1 step in MineDojo simulation is much smaller than 1 second in Youtube videos. Did you use the last consecutive 16 rgb observations to compute reward?

Thank you!

@YHQpkueecs
Copy link
Author

By the way, do you plan to release the training code, or the learned agent parameters?

@rsha256
Copy link

rsha256 commented Oct 26, 2022

@YHQpkueecs Were you able to get the learned agent parameters from @LinxiFan @wangguanzhi or @yunfanjiang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants