Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Give more time to impala tests #31910

Merged

Conversation

ArturNiederfahrenhorst
Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst commented Jan 24, 2023

Signed-off-by: Artur Niederfahrenhorst artur@anyscale.com

Why are these changes needed?

IMPALA release tests have been flakey. They are capable of learning >> 200 mean reward on breakout, but struggle to do so within 1800s. This PR moves that bar to 2400s.
The following screenshot depicts two representative runs of how the mean reward behaves over time (1 on horizontal scale is 1h).
Note that TF has much more variance.
Screenshot 2023-01-24 at 15 22 54
I could not find out why TF has much more variance but noted that the major difference in metrics is with the value loss.
Screenshot 2023-01-24 at 15 23 28

I also tuned LR and num workers to find out of TF being faster is simply an off-policy thing. It's not.

Related TB https://tensorboard.dev/experiment/5vi5EGOdSJWdfwoc1A2qNw/#scalars&tagFilter=cur_lr&_smoothingWeight=0

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
[0, 0.0005],
[20000000, 0.000000000001],
]
lr: 0.0005
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that 20000000, twntymillion in words, is never reached so to not confuse this for 2M, we should just leaved it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds good.

@kouroshHakha
Copy link
Contributor

@gjoliver @sven1977 Can someone merge this plz?

@gjoliver
Copy link
Member

can you not leave the y scale out of the screen shot?
does it really take 4 hrs to reach 200? if we are judging based on max(mean_episode_reward), why would the variance matter?

@ArturNiederfahrenhorst
Copy link
Contributor Author

@gjoliver The veriance does indeed not manifest in the max(mean_episode_reward).
I still found it surprising that it does that. Offpolicyness is the same.

@gjoliver
Copy link
Member

ok, but I still have the question of why then do we need 4 hrs for this test?

@ArturNiederfahrenhorst
Copy link
Contributor Author

@gjoliver Sorry, my mistake. I dug out the original tensorboard that you see in the picture: https://tensorboard.dev/experiment/5vi5EGOdSJWdfwoc1A2qNw/#scalars&tagFilter=cur_lr&_smoothingWeight=0
I might have swapped two screenshots when uploading.

@ArturNiederfahrenhorst
Copy link
Contributor Author

So the run in the picture is a actually 40 minutes long.

@ArturNiederfahrenhorst
Copy link
Contributor Author

In the file changes, I have only moved this from 30 to 40 minutes.

@gjoliver
Copy link
Member

oh ok, man, this sounds a lot better. sorry I misread the numbers 😓

@gjoliver gjoliver merged commit 1929bb1 into ray-project:master Jan 29, 2023
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants