-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray 2.3 release] Last update to results json was too long ago
CI flaky failure
#31981
Comments
from Kai:
|
To add to this, this result fetching procedure works for other tests. So if the "download did not work correctly" it could be e.g. because the jobs server died |
for long_running_many_actor_tasks , the root cause exception is swallowed by the exception handler in an exponential backoff handler. Have a fix in #32014 but ofc not the root cause Will look into long_running_actor_deaths now |
The long_running_actor_deaths failure looks to have the same issue |
Last update to results json was too long ago
CI flaky failure
Update: I thought that the exception handler around In any case, these tests aren't crashing, and we currently don't check for performance regressions on them. So the tests are passing, we're just missing metrics from them with this bug. |
…info on failure (ray-project#32014) It appears the root cause of flaky failures described in ray-project#31981 is suppressed because we're not logging exceptions in `exponential_backoff_retry`. Signed-off-by: Cade Daniel <cade@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
The text was updated successfully, but these errors were encountered: