-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] FsHealthServiceTests.testFailsHealthOnHungIOBeyondHealthyTimeout (Random test Failure) #1567
Comments
Most recently found in #1541 |
Also found in #1584 |
The failed test is introduced by commit f7e2984 After looking at all the test failure reports mentioned above, the error messages are all:
OpenSearch/server/src/test/java/org/opensearch/monitor/fs/FsHealthServiceTests.java Line 239 in 7496f80
The assertion statement leads to the test failure.
Note that in issue #1307, The last line shows a different line number: A few seed shown in the console output: Made some experiment to reproduce the failure, I tried different values for these variables:
I found that sometime the test "testFailsHealthOnHungIOBeyondHealthyTimeout" will be failed when running the whole tests in the class by Another finding:
|
@tlfeng Can you easily reproduce the failure? |
Hi Andrew, I couldn't. Although decreasing some of the time interval can caused the assertion failed, it's not likely the same case in CI.
I guess there might be 3 possible causes:
|
Hi @Bukhtawar, do you have any thoughts about the test failure? |
Adding more delay here should help I guess, assuming this is the only assertion failing OpenSearch/server/src/test/java/org/opensearch/monitor/fs/FsHealthServiceTests.java Lines 239 to 244 in e983fac
|
Maybe just increase that 2x multiplier in the Also, super minor, but I'd also remove this assertion: OpenSearch/server/src/test/java/org/opensearch/monitor/fs/FsHealthServiceTests.java Line 239 in e983fac
As long as the thing after this wait step is to do |
Thanks @Bukhtawar and @andrross for providing suggestions on resolving the test failure. I realized that it's a bit duplicate to have double assertion statements to check the File System health status around the 2 Further, I'm curious is it necessary to have the steps to restore the file system status? Or just a good habit to restore the status after a test case. OpenSearch/server/src/test/java/org/opensearch/monitor/fs/FsHealthServiceTests.java Lines 237 to 249 in e983fac
I noticed that in the test OpenSearch/server/src/test/java/org/opensearch/monitor/fs/FsHealthServiceTests.java Line 145 in e983fac
|
This is the critical line that must happen after each test invocation: OpenSearch/server/src/test/java/org/opensearch/monitor/fs/FsHealthServiceTests.java Line 188 in e983fac
The actual disrupted filesystem is an instance local to a single test invocation, so as long as that static teardown method is called to reset the underlying static variable in PathUtils, then it doesn't matter what is done to the instance. |
@tlfeng Just came across a failure of this test in an unrelated PR. I see the updated error message, so it appears the additional retry time didn't fix it in this case.
|
Reopened after another occurrence #5344 (comment) |
Seems old one, if someone can try to reproduce it. Otherwise, we can close it. |
Closing this issue as stale. Also, this test has not failed based on flaky report #5031 (comment). |
Describe the bug
Random Test Failure. Please dig in, figure out what's wrong :(
https://ci.opensearch.org/logs/ci/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/gradle_check_1068.log
https://ci.opensearch.org/logs/ci/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/gradle_check_1068_reports.zip
The text was updated successfully, but these errors were encountered: