-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Make adding new Policy Models flexible #327
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - a comment, but non blocking.
Great that you're taking a stab at this, which is one of the important things that remain to be done! Looks good so far! |
if self.flatten: | ||
return self.states2proxy(states).flatten(start_dim=1).to(self.float) | ||
return self.states2proxy(states).to(self.float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexhernandezgarcia This is a temporary solution to make the CNN policy work on Tetris env. But normally the flattening should happen inside the model but not in the environment (see my other comments)
if you are okay with that, then I can update.
@@ -75,6 +75,7 @@ def __init__( | |||
height: int = 20, | |||
pieces: List = ["I", "J", "L", "O", "S", "T", "Z"], | |||
rotations: List = [0, 90, 180, 270], | |||
flatten: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move the flattening from the environment to the policy, then we don't need this.
I think any policy could be added now.. i have tested tetris on CNN and everything seems to work well with minimal changes.. I think next thing we could do is have another PR which better documents this and show how one could simply change the Policy etc. test with |
Thank you for the great work! I've added a bunch of suggestions. It seems to me that it would be worth to add a config file for a case when one wants to create a uniform or a random policy, but I don't have a strong opinion about it. |
Hi @AlexandraVolokhova just got time, let me address your comments... Regarding:
I think I agree with you, maybe it would have been nice to have a separate config for fixed and uniform policies (I am hoping you are referring to the two functions inside the base policy class). To be honest I don't even still know why we need them.. We could also keep them as it is with the default configuration. |
…icy-definition [WIP, Policy] Docstring and refactoring on top of PR 327
Fixes #293
This PR tries to make adding new function approximations (policy models) as flexible as possible.
base.py
file to dynamically import and instantiate policy models based on configurationModelBase
class