[rllib] Add debug info back to PPO and fix optimizer compatibility #2366

ericl · 2018-07-07T06:33:52Z

What do these changes do?

Make it work with sync samples optimizer too (this enables LSTM), and add back debug stats.

ericl · 2018-07-07T06:39:55Z

python/ray/rllib/optimizers/multi_gpu_optimizer.py

-                for k, v in iter_extra_fetches.items():
-                    all_extra_fetches[k] += [v]
+                        iter_extra_fetches[k].append(v)
+                print(i, _averaged(iter_extra_fetches))


This was incorrect before; you want to return the last epoch's stats, not the mean across all epochs.

the previous impl returned stats of all epochs and not only an average; it was no less "incorrect".

The reason why you need the "last epoch" is PPO specific.

Eh, I'm not sure why returning all epoch stats makes sense in any scenario, unless your epochs were very small and the values noisy.

AmplabJenkins · 2018-07-07T06:57:03Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6532/
Test FAILed.

AmplabJenkins · 2018-07-07T21:51:37Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6543/
Test FAILed.

AmplabJenkins · 2018-07-07T22:13:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6550/
Test FAILed.

AmplabJenkins · 2018-07-07T22:25:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6542/
Test PASSed.

AmplabJenkins · 2018-07-08T20:48:50Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6567/
Test FAILed.

AmplabJenkins · 2018-07-08T22:51:17Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6570/
Test FAILed.

AmplabJenkins · 2018-07-09T00:07:55Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6571/
Test FAILed.

richardliaw

(comments were made from partial review a couple days ago)

richardliaw · 2018-07-07T06:51:07Z

python/ray/rllib/agents/ppo/ppo.py

+                self.local_evaluator, self.remote_evaluators)

    def _train(self):
        def postprocess_samples(batch):


richardliaw · 2018-07-07T06:51:10Z

python/ray/rllib/agents/ppo/ppo.py

    # Which observation filter to apply to the observation
    "observation_filter": "MeanStdFilter",
+    # Debug only: use the sync samples optimizer instead of the multi-gpu one
+    "debug_use_simple_optimizer": False,


why is this debug_only?

richardliaw · 2018-07-07T06:57:49Z

python/ray/rllib/optimizers/multi_gpu_optimizer.py

-                for k, v in iter_extra_fetches.items():
-                    all_extra_fetches[k] += [v]
+                        iter_extra_fetches[k].append(v)
+                print(i, _averaged(iter_extra_fetches))


the previous impl returned stats of all epochs and not only an average; it was no less "incorrect".

The reason why you need the "last epoch" is PPO specific.

richardliaw · 2018-07-07T06:58:48Z

python/ray/rllib/optimizers/multi_gpu_optimizer.py

-            all_extra_fetches = defaultdict(list)
            num_batches = (
                int(tuples_per_device) // int(self.per_device_batch_size))
+            print("== sgd epochs ==")


can you flag this off with a verbose?

(flag off all of the prints in the optimizers with verbose?)

Let's do this with log level in the future. For now, we've always printed by default so this is just restoring 0.4 functionality.

richardliaw · 2018-07-09T18:09:14Z

python/ray/rllib/evaluation/postprocessing.py

        rewards_plus_v = np.concatenate(
            [rollout["rewards"], np.array([last_r])])
        traj["advantages"] = discount(rewards_plus_v, gamma)[:-1]
+        traj["value_targets"] = np.zeros_like(traj["advantages"])


You can use a critic without using GAE, but this does not allow that functionality. Can you add documentation noting this?

AmplabJenkins · 2018-07-09T20:52:09Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6591/
Test FAILed.

fix ppo

dfb76e6

ericl assigned richardliaw Jul 7, 2018

ericl commented Jul 7, 2018

View reviewed changes

ericl added 3 commits July 7, 2018 14:02

Merge remote-tracking branch 'upstream/master' into fix-ppo

f5000eb

num envs

0297a89

test

499e8a7

ericl force-pushed the fix-ppo branch from ae9b175 to 90fd45b Compare July 7, 2018 21:10

dummy vf

f670ae2

ericl force-pushed the fix-ppo branch from 90fd45b to f670ae2 Compare July 7, 2018 21:10

lstm and multi-pass for simple opt

36989a6

Merge remote-tracking branch 'upstream/master' into fix-ppo

660788c

ericl added 2 commits July 8, 2018 15:38

fix 2.7

17a21cd

fix names

9bfd1d5

richardliaw reviewed Jul 9, 2018

View reviewed changes

ericl added 2 commits July 9, 2018 12:29

Merge remote-tracking branch 'upstream/master' into fix-ppo

7ef5cc3

comments

7e91934

ericl mentioned this pull request Jul 12, 2018

[rllib] Cleanup RNN support and make it work with multi-GPU optimizer #2394

Merged

1 task

richardliaw approved these changes Jul 12, 2018

View reviewed changes

richardliaw merged commit b316afe into ray-project:master Jul 12, 2018

richardliaw mentioned this pull request Jul 16, 2018

[rllib] Replace tf.minimum with tf.maximum in PPO loss. #2265

Closed

ericl mentioned this pull request Jul 19, 2018

Difference on PPO Losses #2233

Closed

[rllib] Add debug info back to PPO and fix optimizer compatibility #2366

[rllib] Add debug info back to PPO and fix optimizer compatibility #2366

Uh oh!

Conversation

ericl commented Jul 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jul 7, 2018

Uh oh!

AmplabJenkins commented Jul 7, 2018

Uh oh!

AmplabJenkins commented Jul 7, 2018

Uh oh!

AmplabJenkins commented Jul 7, 2018

Uh oh!

AmplabJenkins commented Jul 8, 2018

Uh oh!

AmplabJenkins commented Jul 8, 2018

Uh oh!

AmplabJenkins commented Jul 9, 2018

Uh oh!

richardliaw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jul 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericl commented Jul 7, 2018 •

edited

Loading