Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it normal that running train_line.py renders samples from one mode only? #173

Closed
saleml opened this issue Mar 21, 2024 · 4 comments
Closed
Assignees

Comments

@saleml
Copy link
Collaborator

saleml commented Mar 21, 2024

No description provided.

@josephdviviano
Copy link
Collaborator

I likely need to change the default options to enable off policy exploration

@josephdviviano
Copy link
Collaborator

Screenshot 2024-03-31 at 12 33 33

Hmm, on my machine, the training isn't perfect (the default options undertrain the policy) but I definitely sample from both modes.

Were your experiments off of master or is this maybe related to changes we made RE: off policy training in the other open pr (#174)?

@saleml
Copy link
Collaborator Author

saleml commented Apr 2, 2024

I'm investigating the issue. Indeed, different behavior on mater and on fix_off_policy.

One thing worth noting is that with the original number of trajectories (1.28e6), sure the samples are from one mode only, but are more accurate.

image

Edit1: And when running the code for slightly longer on the fix_off_policy branch (3e6 trajectories), I obtain a figure that is similar to the one obtained with master.

@saleml
Copy link
Collaborator Author

saleml commented Apr 2, 2024

In fix_off_policy, I had forgotten a keyword. The problem is fixed in 89c72b5

@saleml saleml closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants