-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TQC to CleanRL #258
Comments
Hi, @AdityaGudimella thanks for your interest in contribution! We would definitely be interested in the TQC contribution.
It's ok but note the merging timeline may be considerably delayed because of this. Because
I still think it's worth it to adhere to the styles in other scripts for consistency unless the loss function is re-used multiple times. The problems with putting loss function in the class are
That said, I am happy to be convinced otherwise; feel free to prototype what you think is best and I will help review. |
Hi, @vwxyzjn. Thank you for your response.
That makes sense to me. I had to update gym from 0.23 to 0.25 in my own library, but it did require too many changes. Out of curiosity, if I put up a PR (separately) just for upgrading the gym version to 0.25.1, what would be required to do to merge it in? Would you be running all the algorithms currently present to convergence? For now, I can see that TQC even without differentiating between terminated and truncated episodes, performs quite well. I can put up a PR first that uses the dones from the gym env as is. Once the rest of the PR is approved, I can change it to differentiate between terminated and truncated. Is that okay?
For your point 1., I don't think the class will have any extra lines of code. If I pass in the args I might have a few more questions while I write the PR. Is it okay to ask them here? |
The main thing would be to ensure the scripts run without error.
No, this would not be required unless we are highly suspicious about a particular environment / script.
Yes this is ok :)
Perfectly reasonable.
Here would be a perfect place to ask. Discord is also good. |
I put up an initial PR here. I've tested it out on Hopper-v3 and Humaoid-v3. Please take a look at it. |
I also created a new issue for the gym upgrade here. |
Problem Description
Would you be interested in adding Truncated Quantile Critics to CleanRL? If so I can work on a PR.
If you are interested, then I have a few questions:
env.step
returnsobs, reward, terminated, truncated, info
instead ofobs, reward, done, info
?Checklist
poetry install
(see CleanRL's installation guideline.The text was updated successfully, but these errors were encountered: