-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8372591: assert(!current->cont_fastpath() || freeze.check_valid_fast_path()) failed #28830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into |
|
@pchilano This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 43 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
/label remove core-libs |
|
@pchilano |
|
This seems to be a "day one" bug with virtual threads. Do you have an idea as to why it has not been noticed before? |
Yes, it's a day one bug. There are a couple of conditions that need to align to trigger it, but the most likely reason it went undetected is the requirement for a big enough difference between the |
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find!
Thanks
I think what you have is good but maybe me wonder if there might be other corner cases that might need further Continuations tests. |
I can't think of others looking at the code. But |
|
Thanks for the reviews David and Alan! |
|
@reinrich @TheRealMDoerr Could you verify if this fix is correct for ppc too? Thanks. |
|
Thanks for the ping! A quick run of the test has passed on PPC64. We'll run more tests. @reinrich may want to take a look, too. |
The fix looks correct. Strangely the reproducer didn't work on ppc. Not even with 3x the locals in the osr method. I'll look a little more into it... |
On ppc we reach If max_stack of the sender is very large then, due to trimming, unextended_sp < sp is possible and the assertion could also fail. Maybe the maximum of unextended_sp and sp could be used? Note that on ppc frame::id() returns the fp. Maybe this should be used as |
Maybe, but I don't see which compiled method it could be, since methods before
On x64 and AArch64 we build the OSR frame starting from the sender's |
It is. The difference to x86 is, that the Actually aarch64 seems to be similar. I think @dean-long told me once that there an interpreter frame also has room for the maximal expression stack (see generate_fixed_frame) and that it get's trimmed by an interpreted callee. What's done just before calling |
When a frame is OSR and the caller is interpreted we call
push_cont_fastpathto possible set_cont_fastpathto thespof the sender. This is needed in order to keep track of interpreted frames in the stack, so that in case of unmounting we don't incorrectly execute the freeze fast path, which is only allowed when all frames are compiled.The problem is that the OSR frame is created starting from the
unextended_spof the sender, not thesp, i.e we pop the interpreted frame like inremove_activation. This means that we could set a value to_cont_fastpaththat will point outside the valid stack (below thespof the OSR frame). If the gap between these stack addresses is big enough,_cont_fastpathcould be later cleared while we still have the interpreter sender in the stack, leading to the reported crash. The simplest case where this will happen is if in a later call to yield all the extra frames leading toContinuation.doYieldfit within this space [1]. It could also be cleared before that if for example at some point we returned from an interpreted callee and the sender sp it's not below_cont_fastpath[2].The fix is to change
OSR_migration_beginto set_cont_fastpathwith the sender'sunextended_spinstead.The patch includes new test
OSRWithManyLocals.javawhich reliably reproduces the crash. I thought about adding it to the existingOSRTest.javabut that was created to exercise a different issue. A better name for that test would beOSRWithManyArgs.java, so I could rename it in this patch if preferred (I also realized that test could be simplified and made easier to read but that's for another PR).I tested the current patch with the new test and also run it through mach5 tiers1-7.
Thanks,
Patricio
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28830/head:pull/28830$ git checkout pull/28830Update a local copy of the PR:
$ git checkout pull/28830$ git pull https://git.openjdk.org/jdk.git pull/28830/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28830View PR using the GUI difftool:
$ git pr show -t 28830Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28830.diff
Using Webrev
Link to Webrev Comment