-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8255452: Doing GC during JVMTI MethodExit event posting breaks return oop #930
Conversation
👋 Welcome back eosterlund! A progress list of the required criteria for merging this PR into |
Webrevs
|
} | ||
|
||
if (exception_exit) { | ||
post_method_exit_inner(thread, mh, state, exception_exit, current_frame, result, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for exception exit, you also need JRT_BLOCK because you want the transition to thread_in_VM for this code, since JRT_BLOCK_ENTRY doesn't do the transition. It should be safe for exception exit and retain the old behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having a look coleen. In fact, not doing the JRT_BLOCK for the exception entry is intentional, because that entry goes through a different JRT_ENTRY (not JRT_BLOCK_ENTRY), that already transitions. So if I do the JRT_BLOCK for the exception path, it asserts saying hey you are already in VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's actually horrible. I wonder if it's possible to hoist saving the result oop into the InterpreterRuntime entry. And pass the Handle into JvmtiExport::post_method_exit().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that first, and ended up with a bunch of non-trivial code duplication instead, as reading the oop is done in both paths but for different reasons. One to preserve/restore it (interpreter remove_activation entry), but also inside of JvmtiExport::post_method_exit() so that it can be passed into the MethodExit. I will give it another shot and see if it is possible to refactor it in a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I uploaded a CR that does pretty much what you suggested, ish. Hope you like it!
Hi Erik, is it possible for GC to mistake a primitive value for a reference when posting the exit event? My understanding is: we are at a random bci of a method that is forced to return early. The expression stack is emptied and the return value is pushed on the expression stack then we call into the interpreter runtime to post the JVMTI method exit event during which we come to a safepoint for GC. The oop map for the bci does not cover this forced early return and if the return value is an object then the reference pushed on the expression stack before is not updated by GC. With your fix the value is updated if it is a reference. If this is correct then to me it appears as if GC can also crash because the oop map for the random bci tells there has to be a reference at the stack position of the return value if it actually is a primitive value. |
I think you've discovered JDK-6449023. And your fix looks like the workaround I tried: |
@fisk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 125 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Oh wow. I had no idea people have been having issues with this since 2009! Thanks for the pointer. Well, let's hope we can finally close it now after marinating the bug for 11 years. |
I think what you are saying is true. Note though that the return value of ForceEarlyReturn is installed with a handshake. The handshake polls of the interpreter are emitted in loop backedges and returns. At loop backedges, the expression stack is empty (required by OSR), and at returns the types match correctly. However, if an arbitrary bytecode performs a runtime call with call_VM() while the bottom of the expression stack is an oop, then I think there is an issue. At that call_VM, the early return value could get installed, and when the C++ function returns, we check for early returns, further dispatching to an unwind routine that posts the MethodExit notification. If we GC during this MethodExit notification, then I think you can crash the GC. The GC code generates an oop map for the frame, checking what the types in the expression stack should be. The early return int is pushed on the slot intersecting with the bottom entry in the expression stack. That bottom entry could be an oop, and the early return value could be an int. Then the early return int will be passed to the oop closure, which should result in a crash. So I suspect that almost always, the handshake installing the ForceEarlyReturn value is installed with a handshake in a bytecode backedge or at a return, where the interpreter safepoint polls for the fast path code. Then you won't notice the issue. But in the rare scenario that the ForceEarlyReturn value is installed in a slow path call from a random bytecode... I can't see how that would work correctly. |
Looking more closely, I gotta say I have no idea why there is a call to clear the expression stack at all in TemplateInterpreterGenerator::generate_earlyret_entry_for(). It is unclear to me what problem if any that solves. It does however seem like it introduces this problem of having a forced int return value intersect with an oop in the expression stack for a BCI performing slow-path calls into the VM. Simply removing the code that clears the expression stack, removes the issue, and I can't see that it introduces any other issue. Anyway, I think this is a separate bug. Do you mind if I push the fix for that bug, as a different RFR? It will likely involve poking around at more platform dependent code. |
Yes right, I wasn't really aware of this. The thread has to be suspended though
Don't really know either. Only thing I currently can think of is that the
Not at all.
Likely. Another solution might be to delay loading the return value until the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Erik,
Nice discovery! Indeed, this is a long standing issue. It looks good in general.
I agree with Coleen, it would be nice if there is an elegant way to move the oop_result saving/restoring code to InterpreterRuntime::post_method_exit. Otherwise, I'm okay with what you have now.
It is also nice discovery of the issue with clearing the expression stack. I think, it was my mistake in the initial implementation of the ForceEarlyReturn when I followed the PopFrame implementation pattern. It is good to separate it from the current fix.
Thanks,
Serguei
I uploaded a new commit to perform some refactoring as requested by Coleen and Serguei. I made the oop save/restore + JRT_BLOCK logic belong only to the path taken from InterpreterRuntime::post_method_exit. An inner posting method is called both from that path and from JvmtiExport::notice_unwind_due_to_exception. I think the result is an improvement in terms of how clear it is. I didn't want to move logic all the way back to InterpreterRuntime::post_method_exit though, as I don't think it looks pretty to have large chunks of JVMTI implementation details in the interpreterRuntime.cpp file. So I did basically what you suggested, with the slight difference of moving all the JVMTI implementation into the JVMTI file instead, which is just called from InterpreterRuntime::post_method_exit. Hope you are okay with this! |
Thanks for reviewing this Serguei. And thanks for confirming our suspicions regarding clearing of the expression stack. I wasn't sure if anyone would be around that knew how it ended up there! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks better. Just to have the JRT_BLOCK be unconditional is an improvement.
// for any thread that actually wants method exit, interp_only_mode is set | ||
return; | ||
} | ||
|
||
// return a flag when a method terminates by throwing an exception | ||
// i.e. if an exception is thrown and it's not caught by the current method | ||
bool exception_exit = state->is_exception_detected() && !state->is_exception_caught(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this only applies to the case where the post_method_exit comes from remove_activation? Good to have it only on this path in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. There might be other cases, when remove_activation is called by the exception code. That's why I didn't want to change it to just true in this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The post_method_exit can come from Zero interpreter:
src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp:
CALL_VM_NOCHECK(InterpreterRuntime::post_method_exit(THREAD));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the case of exception_exit is a condition from the call from notice_unwind_due_to_exception() and not the calls from InterpreterRuntime::post_method_exit() for both callers. But maybe if the exception is set and not caught, this post_method_exit() is called when unwinding to the exception handler. I can't tell, so leave it to be safe.
Erik, Thank you for the update! It looks more elegant. One concern is that after the move of this fragment to the post_method_exit_inner:
there is no guarantee that the current frame is interpreted below:
Probably, extra checks for current_frame.is_interpreted_frame() in these fragments will be sufficient. |
That makes sense. Added a check in the latest version that we are in interp only mode. |
Hi Erik, I'm not sure, if this fragment is still needed:
Also, can it be that this condition is true: Thanks, |
Seems like it is not needed. I removed it.
It could definitely be that the top frame is interpreted even though that condition is true. However, we now enter InterpreterRuntime::post_method_exit as a JRT_BLOCK_ENTRY call, which performs no transition (similar to JRT_LEAF). So if so we should just return back to the caller without doing anything, and no GC will happen in this path then. It is only when we perform the JRT_BLOCK and JRT_BLOCK_END that we allow GCs to happen, and we save/restore the result across that section. Thanks,
|
@fisk Unknown command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erik,
Thank you for the explanation!
I agree with you, so the fix is good.
Thanks,
Serguei
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good.
// for any thread that actually wants method exit, interp_only_mode is set | ||
return; | ||
} | ||
|
||
// return a flag when a method terminates by throwing an exception | ||
// i.e. if an exception is thrown and it's not caught by the current method | ||
bool exception_exit = state->is_exception_detected() && !state->is_exception_caught(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the case of exception_exit is a condition from the call from notice_unwind_due_to_exception() and not the calls from InterpreterRuntime::post_method_exit() for both callers. But maybe if the exception is set and not caught, this post_method_exit() is called when unwinding to the exception handler. I can't tell, so leave it to be safe.
/integrate |
@fisk Since your change was applied there have been 139 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 3a02578. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
… oop Reviewed-by: coleenp, dlong, rrich, sspitsyn
The imasm::remove_activation() call does not deal with safepoints very well. However, when the MethodExit JVMTI event is being called, we call into the runtime in the middle of remove_activation(). If the value being returned is an object type, then the top-of-stack contains the oop. However, the GC does not traverse said oop in any oop map, because it is simply not expected that we safepoint in the middle of remove_activation().
The JvmtiExport::post_method_exit() function we end up calling, reads the top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that eventually call Java and a bunch of stuff that safepoints. So after the JVMTI callback, we can expect the top-of-stack oop to be broken. Unfortunately, when we continue, we therefore end up returning a broken oop.
Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, is wrong, as we can safepoint on the way back to Java, which will break the return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving the transition to VM and back, into a block of code that is protected against GC. Before the JRT_BLOCK is called, we stash away the return oop, and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when InterpreterRuntime::post_method_exit is called when throwing an exception, we don't have the same problem of retaining an oop result, and hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is the same as before for this path.
This is a JVMTI bug that has probably been around for a long time. It crashes with all GCs, but was discovered recently after concurrent stack processing, as StefanK has been running better GC stressing code in JVMTI, and the bug reproduced more easily with concurrent stack processing, as the timings were a bit different. The following reproducer failed pretty much 100% of the time:
while true; do make test JTREG="RETAIN=all" TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done
With my fix I can run this repeatedly without any more failures. I have also sanity checked the patch by running tier 1-5, so that it does not introduces any new issues on its own. I have also used Stefan's nice external GC stressing with jcmd technique that was used to trigger crashes with other GCs, to make sure said crashes no longer reproduce either.
Progress
Testing
Failed test task
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/930/head:pull/930
$ git checkout pull/930