more pylint fixes #2842

chriselion · 2019-11-02T23:13:16Z

Uncomment and fix some more pylint warnings, namely:

Parameters differ from overridden
Attribute '...' defined outside __init__

Depending on some precommit settings, the latter sometimes fires on MultiGpuPPOPolicy, but I had a hard time fixing that one up because MultiGpuPPOPolicy.create_model will be called from the parent's init.

chriselion · 2019-11-02T23:16:50Z

ml-agents-envs/mlagents/envs/base_unity_environment.py

@@ -9,9 +9,9 @@ class BaseUnityEnvironment(ABC):
    def step(
        self,
        vector_action: Optional[Dict] = None,
-        memory: Optional[Dict] = None,


This was missed in the memory cleanup

chriselion · 2019-11-02T23:17:18Z

ml-agents-envs/mlagents/envs/base_unity_environment.py

        text_action: Optional[Dict] = None,
        value: Optional[Dict] = None,
+        custom_action: Dict[str, Any] = None,


These are being passed (for now) by the implementations, but will likely go away soon.

chriselion · 2019-11-02T23:17:45Z

ml-agents-envs/mlagents/envs/env_manager.py

+        self,
+        config: Dict = None,
+        train_mode: bool = True,
+        custom_reset_parameters: Any = None,


Being passed by implementations.

chriselion · 2019-11-02T23:18:03Z

ml-agents/mlagents/trainers/bc/trainer.py

@@ -45,20 +45,20 @@ def __init__(self, brain, trainer_parameters, training, load, seed, run_id):

    def add_experiences(
        self,
-        curr_info: AllBrainInfo,
-        next_info: AllBrainInfo,
+        curr_all_info: AllBrainInfo,


Renamed these to match parent

chriselion · 2019-11-02T23:18:20Z

ml-agents/mlagents/trainers/components/reward_signals/gail/signal.py

@@ -126,7 +126,7 @@ def check_config(
    def prepare_update(
        self,
        policy_model: LearningModel,
-        mini_batch_policy: Dict[str, np.ndarray],
+        mini_batch: Dict[str, np.ndarray],


Renamed to match parent

chriselion · 2019-11-02T23:19:22Z

ml-agents/mlagents/trainers/ppo/trainer.py

    ) -> None:
        """
        Checks agent histories for processing condition, and processes them as necessary.
        Processing involves calculating value and advantage targets for model updating step.
        :param current_info: Dictionary of all current brains and corresponding BrainInfo.
-        :param new_info: Dictionary of all next brains and corresponding BrainInfo.
+        :param next_info: Dictionary of all next brains and corresponding BrainInfo.


We seemed to be 50/50 on new vs next for the names here. I went with next_info because it seems more consistent with current_info

chriselion · 2019-11-02T23:19:55Z

ml-agents/mlagents/trainers/sac/models.py

@@ -46,9 +46,41 @@ def __init__(
        self.activ_fn = self.swish

        self.policy_memory_in: Optional[tf.Tensor] = None
+        self.policy_memory_out: Optional[tf.Tensor] = None


Not sure if there's a better way to group all of these.

Ahh, this is ugly (one more reason to move to TF 2.0). I can take a stab at grouping by comment (e.g. divvy up policy vs. critic vs. output tensors) but they are all needed.

I'm assuming we'll have a similar block in the PPO model?

it's OK as it is. One way to DRY it up a bit would be something like

class QInfo self.q: Optional[tf.Tensor] = None self.q_p: Optional[tf.Tensor] = None self.q_memory_in: Optional[tf.Tensor] = None self.q_memory_out: Optional[tf.Tensor] = None self.q_heads: Optional[Dict[str, tf.Tensor]] = None self.q_pheads: Optional[Dict[str, tf.Tensor]] = None .... class SACNetwork(LearningModel): self.q1 = QInfo() self.q2 = QInfo()

but that might not be a good abstraction (and probably not a good name).

Surprisingly it's not complaining about PPO right now.

Hmm the class nesting doesn't seem very obvious. I'd almost want a group that is model inputs and one that is model outputs - but that seems to require more code changes, so let's leave it for now.

chriselion · 2019-11-02T23:21:17Z

ml-agents/mlagents/trainers/sac/policy.py

-        self, mini_batch: Dict[str, Any], num_sequences: int, update_target: bool = True
+        # pylint: disable=arguments-differ
+        # TODO ervteng FIX ME
+        self,


@ervteng Can you take a look at this? update_target is not in the base class signature. I only did a quick search, but it looked like we only ever call this with update_target=True but wasn't 100% sure if it was safe to remove

Hmm, this is a subtlety of the SAC algorithm. There are two value networks, a target and a current one. The target is used to bootstrap the current one, and is updated periodically towards the weights of the current.

Most implementations of SAC let you vary when this update happens. Initially, I had an additional hyperparameter that did just that, but removed it b/c realistically you can get away with just adjusting tau (and less hyperparameters are better for most users).

I think it's safe to remove it for now. A researcher might want to vary this, but then again a researcher can edit the Python code. Perhaps we can add a comment -> I'll add it below.

Line 213: # Update target network. By default, target update happens at every policy update.

Sounds good, I'll apply that. If you want to keep the flexibility, you could so something like move all the current update() to _update(), and make update() call _update() with update_target=True (or some other logic)

👍 I'll probably do something like this when we rehaul the Policy and Trainer APIs, but will hold off for this particular version of the code.

ervteng

LGTM

Chris Elion added 5 commits November 2, 2019 14:50

more pylint cleanup

ffba52f

define some tf objects in inits

6b4bc1f

more fixup

40a0917

pylint settings (fails without require_serial)

b692198

re-enable require_serial

f95f92a

chriselion commented Nov 2, 2019

View reviewed changes

chriselion requested review from harperj and ervteng November 2, 2019 23:21

make update() match parent

dffe80e

ervteng approved these changes Nov 4, 2019

View reviewed changes

harperj approved these changes Nov 4, 2019

View reviewed changes

chriselion merged commit 5a3db85 into develop Nov 4, 2019

delete-merged-branch bot deleted the develop-more-pylint branch November 4, 2019 19:13

github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more pylint fixes #2842

more pylint fixes #2842

chriselion commented Nov 2, 2019 •

edited

Loading

chriselion Nov 2, 2019

chriselion Nov 2, 2019

chriselion Nov 2, 2019

chriselion Nov 2, 2019

chriselion Nov 2, 2019

chriselion Nov 2, 2019

chriselion Nov 2, 2019

ervteng Nov 4, 2019

chriselion Nov 4, 2019

ervteng Nov 4, 2019

chriselion Nov 2, 2019

ervteng Nov 4, 2019

ervteng Nov 4, 2019

chriselion Nov 4, 2019

ervteng Nov 4, 2019

ervteng left a comment

more pylint fixes #2842

more pylint fixes #2842

Conversation

chriselion commented Nov 2, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ervteng left a comment

Choose a reason for hiding this comment

chriselion commented Nov 2, 2019 •

edited

Loading