Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more pylint fixes #2842

Merged
merged 6 commits into from
Nov 4, 2019
Merged

more pylint fixes #2842

merged 6 commits into from
Nov 4, 2019

Conversation

chriselion
Copy link
Contributor

@chriselion chriselion commented Nov 2, 2019

Uncomment and fix some more pylint warnings, namely:

  • Parameters differ from overridden
  • Attribute '...' defined outside __init__

Depending on some precommit settings, the latter sometimes fires on MultiGpuPPOPolicy, but I had a hard time fixing that one up because MultiGpuPPOPolicy.create_model will be called from the parent's init.

@@ -9,9 +9,9 @@ class BaseUnityEnvironment(ABC):
def step(
self,
vector_action: Optional[Dict] = None,
memory: Optional[Dict] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was missed in the memory cleanup

text_action: Optional[Dict] = None,
value: Optional[Dict] = None,
custom_action: Dict[str, Any] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are being passed (for now) by the implementations, but will likely go away soon.

self,
config: Dict = None,
train_mode: bool = True,
custom_reset_parameters: Any = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being passed by implementations.

@@ -45,20 +45,20 @@ def __init__(self, brain, trainer_parameters, training, load, seed, run_id):

def add_experiences(
self,
curr_info: AllBrainInfo,
next_info: AllBrainInfo,
curr_all_info: AllBrainInfo,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed these to match parent

@@ -126,7 +126,7 @@ def check_config(
def prepare_update(
self,
policy_model: LearningModel,
mini_batch_policy: Dict[str, np.ndarray],
mini_batch: Dict[str, np.ndarray],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to match parent

) -> None:
"""
Checks agent histories for processing condition, and processes them as necessary.
Processing involves calculating value and advantage targets for model updating step.
:param current_info: Dictionary of all current brains and corresponding BrainInfo.
:param new_info: Dictionary of all next brains and corresponding BrainInfo.
:param next_info: Dictionary of all next brains and corresponding BrainInfo.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seemed to be 50/50 on new vs next for the names here. I went with next_info because it seems more consistent with current_info

@@ -46,9 +46,41 @@ def __init__(
self.activ_fn = self.swish

self.policy_memory_in: Optional[tf.Tensor] = None
self.policy_memory_out: Optional[tf.Tensor] = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there's a better way to group all of these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, this is ugly (one more reason to move to TF 2.0). I can take a stab at grouping by comment (e.g. divvy up policy vs. critic vs. output tensors) but they are all needed.

I'm assuming we'll have a similar block in the PPO model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's OK as it is. One way to DRY it up a bit would be something like

class QInfo
    self.q: Optional[tf.Tensor] = None
    self.q_p: Optional[tf.Tensor] = None
    self.q_memory_in: Optional[tf.Tensor] = None
    self.q_memory_out: Optional[tf.Tensor] = None
    self.q_heads: Optional[Dict[str, tf.Tensor]] = None
    self.q_pheads: Optional[Dict[str, tf.Tensor]] = None
    ....
class SACNetwork(LearningModel):
    self.q1 = QInfo()
    self.q2 = QInfo()

but that might not be a good abstraction (and probably not a good name).

Surprisingly it's not complaining about PPO right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the class nesting doesn't seem very obvious. I'd almost want a group that is model inputs and one that is model outputs - but that seems to require more code changes, so let's leave it for now.

self, mini_batch: Dict[str, Any], num_sequences: int, update_target: bool = True
# pylint: disable=arguments-differ
# TODO ervteng FIX ME
self,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ervteng Can you take a look at this? update_target is not in the base class signature. I only did a quick search, but it looked like we only ever call this with update_target=True but wasn't 100% sure if it was safe to remove

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is a subtlety of the SAC algorithm. There are two value networks, a target and a current one. The target is used to bootstrap the current one, and is updated periodically towards the weights of the current.

Most implementations of SAC let you vary when this update happens. Initially, I had an additional hyperparameter that did just that, but removed it b/c realistically you can get away with just adjusting tau (and less hyperparameters are better for most users).

I think it's safe to remove it for now. A researcher might want to vary this, but then again a researcher can edit the Python code. Perhaps we can add a comment -> I'll add it below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 213: # Update target network. By default, target update happens at every policy update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll apply that. If you want to keep the flexibility, you could so something like move all the current update() to _update(), and make update() call _update() with update_target=True (or some other logic)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'll probably do something like this when we rehaul the Policy and Trainer APIs, but will hold off for this particular version of the code.

@chriselion chriselion requested review from harperj and ervteng November 2, 2019 23:21
Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chriselion chriselion merged commit 5a3db85 into develop Nov 4, 2019
@delete-merged-branch delete-merged-branch bot deleted the develop-more-pylint branch November 4, 2019 19:13
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants