-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training PPO-algorithm #8
Comments
The model we trained used an old version of the user simulator. We will re-train the model with the current simulator soon :) |
For better performance, you should do imitation learning before reinforcement learning. The immitating learning is implemented in the Then you can run |
Thanks for the quick instructions! I followed the procedure but the performance was actually degrading to something around 35%. @zqwerty looking forward to the results you have! |
Actually, @liangrz15 is the one that trained the RL policies. I think he will update the training script for reproducibility. |
Hey, I got the same question, and actually when I read the log file of evaluation, it always shows this: |
We have updated the policy to address this issue. Have a try! |
I've try to train MLE and then PPO, please see #15 (comment) |
Hi Chris, how do you see the Success rate during training? I think the only logs I see on console are the losses. Cheers! |
Hi Ben! I added a method called "evaluate" which is executed during the training. I basically copied the "evaluate" method of "convlab2/policy/evaluate.py" :D Cheers! |
Hi Chris, once again asking you about your results, I can see you managed to replicate paper's figures for PPO? When you say you set evaluator = None in the environment, you meant you: a) grabbed the
where Thanks a lot, Nick |
move to #54 |
I executed the provided train.py script in convlab2/policy/ppo with the prespecified configurations. During training, the success-rate starts pretty high with around 25% and then bumps around 30-35% for some while. When training is finished, I used the evaluation.py script in convlab2/policy to evaluate the performance which gives me 26%, far from the 74% reported in the table.
My Question: What is the exact configuration that has been used for training the 74% model?
The text was updated successfully, but these errors were encountered: