-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training details about MineAgent #9
Comments
Hello, Did you manage to reimplement the training code for the agents with PPO? I'm getting some issues with the nested dicts despite using the multi-input policy. |
@iSach Hi. I tried to reimplement PPO from CleanRL code. I use Gym's |
I'm not extremely familiar with running more complex environments like these (have only run very basic envs in gym's tutorials). Do you have a repo or a gist to look at? My main issue is dealing with the nested dict's in the env's observation space. I tried to implement a custom features extractor based on the "SimpleFeatureFusion", but can't manage to get something running at all. |
Unfortunately, currently not. I don't think my previous code is bug-free or worth referencing. However, I do suggest that you can start from their provided code like |
I tried, but I'm getting so many problems with PPO because of the weird environment. I don't understand how to get a clean training code. I don't understand why they would release everything except the code for reproducing results. Especially considering the few tasks demonstrated in the code. |
About the policy algorithm training:
I would appreciate it if you can clarify the points above. It would be helpful if you release the policy training code in the future. |
Hi @elcajas, Since the authors do not reply this issue, I do not continue reimplementing PPO in MineDojo. For things I can share, I implement PPO based on the CleanRL version and adopt a vector env to speed up. The network backbone is like the FeatureFusion from this repo.
After a fixed number of env steps.
I refer to the CleanRL code and Table A.3 from the MineDojo paper.
Yes.
No. Use the default discrete version of PPO is okay.
Unfortunately I haven't tried that.
I'm not clear about this question. Can you provide some details? Generally, that's just some of my experiences although I do not work on it recently. I sincerely hope the authors and our community can open-source some RL approaches to this benchmark. |
Also found a bug in the example code. See #11. |
@elcajas I am having the same questions as you. Did you get any further ? |
Hi. Thank you for releasing the precious benchmark! I'm working on implementing the PPO agent you reported in the paper. However, I found some misalignments between the code and your paper.
Trimmed action space
As mentioned by #4, the code below does not correspond to the 89 action dims in Appendix G.2.
MineCLIP/main/mineagent/run_env_in_loop.py
Line 75 in e6c06a0
About the
compass
observationIn the paper I see that the compass has a shape of
(2,)
. However, I see an input of(4,)
shape in your code.MineCLIP/main/mineagent/run_env_in_loop.py
Line 25 in e6c06a0
Training on
MultiDiscrete
action spaceIs the 89-dimension action space in the paper a
MultiDiscrete
action space like the original MineDojo action space, or you simply treat it as aDiscrete
action space?In addition, can you release the training code on three task groups in the paper (or share this code via my GitHub email)? It will be beneficial for baseline comparisons!
The text was updated successfully, but these errors were encountered: