-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI test linux://:mutable_object_test is flaky #44396
Comments
I'm not sure whether it's because of that PR but @jackhumphries is the master mind behind cpp anyway ;) |
Based on a local issue I resolved, I believe this commit should fix the flaky mutable_object_test in ray-project#44396. The header of the mutable object was not being explicitly initialized. In some cases, this caused deadlock due to a spinlock backed by garbage memory. This commit explicitly initializes the header, which should resolve the issue. Tested: mutable_object_test Signed-off-by: Jack Humphries <1645405+jackhumphries@users.noreply.github.com>
Based on a local issue I resolved, I believe this commit should fix the flaky mutable_object_test in #44396. The header of the mutable object was not being explicitly initialized. In some cases, this caused deadlock due to a spinlock backed by garbage memory. This commit explicitly initializes the header, which should resolve the issue. Tested: mutable_object_test Signed-off-by: Jack Humphries <1645405+jackhumphries@users.noreply.github.com>
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3826#018ea456-8ee6-4b39-9a95-ef990a519880 |
Still flaky. |
CI test linux://:mutable_object_test is flaky. Recent failures: DataCaseName-linux://:mutable_object_test-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3943#018ec9e9-e96f-4739-ab74-a33168c56721 |
From initial investigation, the test is "flaky" because it takes a long time to run. In |
A quick fix to reduce contention would be to add |
Should we just increase the test timeout?
…On Fri, Apr 12, 2024 at 5:06 PM jackhumphries ***@***.***> wrote:
A quick fix to reduce contention would be to add sched_yield() between
sem_post() and TryToAcquireSemaphore(). A good long term fix would be to
use a futex to sleep on the version number to avoid polling.
—
Reply to this email directly, view it on GitHub
<#44396 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATREBAAU6IBIRAASNVGSJDY5BZIFAVCNFSM6AAAAABFSHVC7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJSG4YTENRYHA>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
I'd be inclined to keep it as is, because this test should run quickly. It's just unnecessary contention that shouldn't be there in the first place due to polling. |
Got it, sounds good!
…On Fri, Apr 12, 2024, 5:14 PM jackhumphries ***@***.***> wrote:
I'd be inclined to keep it as is, because this test should run quickly.
It's just unnecessary contention that shouldn't be there in the first place
due to polling.
—
Reply to this email directly, view it on GitHub
<#44396 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATREBAM2B7GYNRNT6OKVRDY5B2IFAVCNFSM6AAAAABFSHVC7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJSG4YTSNRVHE>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
CI test linux://:mutable_object_test is consistently_failing. Recent failures: DataCaseName-linux://:mutable_object_test-END |
2 similar comments
CI test linux://:mutable_object_test is consistently_failing. Recent failures: DataCaseName-linux://:mutable_object_test-END |
CI test linux://:mutable_object_test is consistently_failing. Recent failures: DataCaseName-linux://:mutable_object_test-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/4001#018ed618-8383-4ee6-a050-2a3bc9d23cc9 |
1 similar comment
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/4001#018ed618-8383-4ee6-a050-2a3bc9d23cc9 |
CI test linux://:mutable_object_test is flaky. Recent failures: DataCaseName-linux://:mutable_object_test-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3268#018e0b9c-f120-4741-a4a0-745efa45d938 |
CI test linux://:mutable_object_test is flaky. Recent failures: DataCaseName-linux://:mutable_object_test-END |
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/4037#018ee8c7-e7ed-4143-8548-a8ee2261c2a4 |
CI test linux://:mutable_object_test is flaky. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/3797#018e9b5a-03dd-4967-a20f-45116141500e
- https://buildkite.com/ray-project/postmerge/builds/3795#018e9b0a-9d1d-445b-b5c2-01143ba361f1
- https://buildkite.com/ray-project/postmerge/builds/3791#018e9a7b-5bde-4252-b036-8f11e8282f5d
- https://buildkite.com/ray-project/postmerge/builds/3792#018e9a81-897d-4bd3-9df5-c100f599a5ef
- https://buildkite.com/ray-project/postmerge/builds/3791#018e9a7b-5bda-42bb-9e4e-f167c7fc54a5
DataCaseName-linux://:mutable_object_test-END
Managed by OSS Test Policy
The text was updated successfully, but these errors were encountered: