-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix halted thread being allowed to continue execution #15201
Conversation
613d833
to
2adb4df
Compare
The changes look good to me, at least the fix part that prevent potential inspector thread to proceed after it requested inspection, but was incorrectly allowed to proceed with inspection, if the inspecting thread went through releasing/reacquiring JNI critical access and so mis-observing that HALT_INSPECTION flag has been raised. Now, we will properly check that flag after reacquire, and block if inspection pending. But please elaborate more in the commit comment for future readers, why NOT_SAFE had to move to outter points and why this does not compromise deadlock prevention from #12257 |
Can we get some nested cases for "set |
I explicitly handle the only nested case I believe we need to handle - exclusive VM acess sets NOT_AT_SAFE_POINT only if it was not already set, and clears it only if exclusive set it. I don't believe there are any other nested cases in this PR since we're putting the set/clear as far out as possible. |
@amicic and I believe this code reintroduces the original hang. Original issue:
Current code:
I believe this may be fixed by making haltThreadForInspection mark the inspecting thread as NOT_SAFE:
|
@jdekonin So you can track this. |
See details in eclipse-openj9#15201 Signed-off-by: Graham Chapman <graham_chapman@ca.ibm.com>
Jenkins test sanity all jdk11 |
jenkins compile win32 jdk8 |
All approvals and testing done, so I'm going to bend the rules and merge this to facilitate the final part of the change. |
There is an assertion in /job_output.php?id=35652328, which reproduced 2/50 in a grinder, but 0/200 on the previous build.
Also seen in the following:
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux_Nightly/237
There may be more, there are a number of other failures I didn't look at yet. I'll finishing going through the builds later. For now, I'm reverting this. |
Reverted via #15226 |
saveObjects() expects non NULL |
This PR is a variant of eclipse-openj9#15201 with handling of saveObjects NULL case Primary issue: It was possible for a thread which is halted for inspection to resume execution after a GC, resulting in the stack walk crashing in the inspecting thread. This was introduced in eclipse-openj9#12257 . Secondary issue: If instrumentable object allocate is hooked, it was possible to HCR at JIT object allocation points. New restriction: The JVMTI extension event for instrumentable object allocate may now only be acquired at startup, not during late attach. As far as I know, there are no active users of this event. Note that NOT_AT_SAFE_POINT has two distinct if related meanings: - Normal exclusive VM access has priority over safe point Setting NOT_AT_SAFE_POINT around all GCing operations accomplishes this. - JIT optimization requires that there be no possibility of HCR at object allocation points Extending the range of NOT_AT_SAFE_POINT to cover the entire allocation path and disabling safe point if the instumentable object allocate event is hooked ensures this. This fix has several parts: - Pause after GC if halt has been requested After any possibly-GCing path, check to see if the thread should halt before resuming mutation. This fixes the reported problem of inspected threads continuing to run. This also requires that NOT_AT_SAFE_POINT be set across the entirety of the allocation path to prevent safe point from being acquired by the new halting code. - Fix object allocation event reporting If the instrumentable object allocation extension event was hooked, there was a timing hole where HCR could occur at an object allocation from the JIT, which is specificially what safe point HCR is meant to avoid. This is fixed by marking the thread NOT_AT_SAFE_POINT for the duration of the allocate. If the instrumentable object allocate event is hooked, disable safe point HCR as there is no way to safely report the event. - Mark all possibly GCing paths NOT_AT_SAFE_POINT This ensures that the GC has priority over safe point HCR. haltThreadForInspection also now marks the inspecting thread NOT_SAFE for the duration of the halt (see timing below). Signed-off-by: Aleksandar Micic <amicic@ca.ibm.com>
Primary issue: It was possible for a thread which is halted for inspection to resume execution after a GC, resulting in the stack walk crashing in the inspecting thread. This was introduced in #12257 .
Secondary issue: If instrumentable object allocate is hooked, it was possible to HCR at JIT object allocation points.
New restriction: The JVMTI extension event for instrumentable object allocate may now only be acquired at startup, not during late attach. As far as I know, there are no active users of this event.
Note that NOT_AT_SAFE_POINT has two distinct if related meanings:
Setting NOT_AT_SAFE_POINT around all GCing operations accomplishes this.
Extending the range of NOT_AT_SAFE_POINT to cover the entire allocation path and disabling safe point if the instumentable object allocate event is hooked ensures this.
This fix has several parts:
After any possibly-GCing path, check to see if the thread should halt before resuming mutation. This fixes the reported problem of inspected threads continuing to run. This also requires that NOT_AT_SAFE_POINT be set across the entirety of the allocation path to prevent safe point from being acquired by the new halting code.
If the instrumentable object allocation extension event was hooked, there was a timing hole where HCR could occur at an object allocation from the JIT, which is specificially what safe point HCR is meant to avoid.
This is fixed by marking the thread NOT_AT_SAFE_POINT for the duration of the allocate. If the instrumentable object allocate event is hooked, disable safe point HCR as there is no way to safely report the event.
This ensures that the GC has priority over safe point HCR.
haltThreadForInspection also now marks the inspecting thread NOT_SAFE for the duration of the halt (see timing below).
Signed-off-by: Graham Chapman graham_chapman@ca.ibm.com