Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning from scratch without using pre-trained model #15

Closed
EnnaSachdeva opened this issue Dec 3, 2019 · 4 comments
Closed

Learning from scratch without using pre-trained model #15

EnnaSachdeva opened this issue Dec 3, 2019 · 4 comments

Comments

@EnnaSachdeva
Copy link

EnnaSachdeva commented Dec 3, 2019

I tried running test.py (PPO.py) from scratch on LunarLander-v2 Environment, without using the pre-trained model, but it does not seem to learn till 15000episodes. The episodic returns are negative even after 15000 episodes. How many episodes did it take to get the trained model?

@EnnaSachdeva EnnaSachdeva changed the title Learning from scratch Learning from scratch without using pre-trained model Dec 3, 2019
@nikhilbarhate99
Copy link
Owner

Hey, have you tried training it multiple times? or did you change the hyper-parameters?
I have been able to train it within 1500 episodes on average (although it gets stuck in a local maxima sometimes) with the current hyper-parameters.
Also, I have added 2 commits to address some issues mentioned in #10 and #8 , and have not tested the algorithm after. Can you please try with the earlier version and let me know?

@EnnaSachdeva
Copy link
Author

EnnaSachdeva commented Dec 3, 2019

I am running the master branch test.py and PPO.py (I hope all the recent changes are pushed in these), and I ran the code as it is, Just commented on the "load_state_dict" line in the code, with no changes in hyperparameters. These are some of the rewards I am getting.

Episode: 14994 Reward: -51
Episode: 14995 Reward: -188
Episode: 14996 Reward: -214
Episode: 14997 Reward: -403
Episode: 14998 Reward: -169
Episode: 14999 Reward: -64
Episode: 15000 Reward: -252

Also, I am using this version of code with a small grid world environment, and it does not seem to learn at all there as well.

@nikhilbarhate99
Copy link
Owner

Ahh...I see, The test.py file is NOT for training, it is a utility file to load and run pre trained policies. Please run the PPO.pyfile for training.

Also, I ran some tests now on the Lunar Lander env and it seems to train just fine.

@EnnaSachdeva
Copy link
Author

EnnaSachdeva commented Dec 3, 2019

Ohh, my bad.
I was using only PPO.py for my custom environment (with obvious hyperparameter changes), and it does not seem to work.
Anyway, Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants