-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more pylint fixes #2842
more pylint fixes #2842
Conversation
@@ -9,9 +9,9 @@ class BaseUnityEnvironment(ABC): | |||
def step( | |||
self, | |||
vector_action: Optional[Dict] = None, | |||
memory: Optional[Dict] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missed in the memory cleanup
text_action: Optional[Dict] = None, | ||
value: Optional[Dict] = None, | ||
custom_action: Dict[str, Any] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are being passed (for now) by the implementations, but will likely go away soon.
self, | ||
config: Dict = None, | ||
train_mode: bool = True, | ||
custom_reset_parameters: Any = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being passed by implementations.
@@ -45,20 +45,20 @@ def __init__(self, brain, trainer_parameters, training, load, seed, run_id): | |||
|
|||
def add_experiences( | |||
self, | |||
curr_info: AllBrainInfo, | |||
next_info: AllBrainInfo, | |||
curr_all_info: AllBrainInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed these to match parent
@@ -126,7 +126,7 @@ def check_config( | |||
def prepare_update( | |||
self, | |||
policy_model: LearningModel, | |||
mini_batch_policy: Dict[str, np.ndarray], | |||
mini_batch: Dict[str, np.ndarray], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to match parent
) -> None: | ||
""" | ||
Checks agent histories for processing condition, and processes them as necessary. | ||
Processing involves calculating value and advantage targets for model updating step. | ||
:param current_info: Dictionary of all current brains and corresponding BrainInfo. | ||
:param new_info: Dictionary of all next brains and corresponding BrainInfo. | ||
:param next_info: Dictionary of all next brains and corresponding BrainInfo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seemed to be 50/50 on new vs next for the names here. I went with next_info because it seems more consistent with current_info
@@ -46,9 +46,41 @@ def __init__( | |||
self.activ_fn = self.swish | |||
|
|||
self.policy_memory_in: Optional[tf.Tensor] = None | |||
self.policy_memory_out: Optional[tf.Tensor] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there's a better way to group all of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, this is ugly (one more reason to move to TF 2.0). I can take a stab at grouping by comment (e.g. divvy up policy vs. critic vs. output tensors) but they are all needed.
I'm assuming we'll have a similar block in the PPO model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's OK as it is. One way to DRY it up a bit would be something like
class QInfo
self.q: Optional[tf.Tensor] = None
self.q_p: Optional[tf.Tensor] = None
self.q_memory_in: Optional[tf.Tensor] = None
self.q_memory_out: Optional[tf.Tensor] = None
self.q_heads: Optional[Dict[str, tf.Tensor]] = None
self.q_pheads: Optional[Dict[str, tf.Tensor]] = None
....
class SACNetwork(LearningModel):
self.q1 = QInfo()
self.q2 = QInfo()
but that might not be a good abstraction (and probably not a good name).
Surprisingly it's not complaining about PPO right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm the class nesting doesn't seem very obvious. I'd almost want a group that is model inputs and one that is model outputs - but that seems to require more code changes, so let's leave it for now.
self, mini_batch: Dict[str, Any], num_sequences: int, update_target: bool = True | ||
# pylint: disable=arguments-differ | ||
# TODO ervteng FIX ME | ||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ervteng Can you take a look at this? update_target
is not in the base class signature. I only did a quick search, but it looked like we only ever call this with update_target=True
but wasn't 100% sure if it was safe to remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is a subtlety of the SAC algorithm. There are two value networks, a target and a current one. The target is used to bootstrap the current one, and is updated periodically towards the weights of the current.
Most implementations of SAC let you vary when this update happens. Initially, I had an additional hyperparameter that did just that, but removed it b/c realistically you can get away with just adjusting tau (and less hyperparameters are better for most users).
I think it's safe to remove it for now. A researcher might want to vary this, but then again a researcher can edit the Python code. Perhaps we can add a comment -> I'll add it below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 213: # Update target network. By default, target update happens at every policy update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll apply that. If you want to keep the flexibility, you could so something like move all the current update()
to _update()
, and make update()
call _update()
with update_target=True
(or some other logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'll probably do something like this when we rehaul the Policy and Trainer APIs, but will hold off for this particular version of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Uncomment and fix some more pylint warnings, namely:
__init__
Depending on some precommit settings, the latter sometimes fires on MultiGpuPPOPolicy, but I had a hard time fixing that one up because MultiGpuPPOPolicy.create_model will be called from the parent's init.