-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK19 java/lang/Thread/virtual/stress/TimedGet.java timed out #17163
Comments
@babsingh @fengxue-IS pls take a look. |
Potential duplicate of #15184 if no timeouts are seen in the above grinder. |
It's not sub-4G allocation since it's failing in jdk_lang_1 mode. The test is also running with |
Results from Jason's grinder. https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder_iteration_3/1831 https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder_iteration_4/1732/ https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder_iteration_0/2414 |
Results from Babneet's grinder. https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder_iteration_0/2418/ https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder_iteration_4/1736/ https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder_iteration_1/2247/ |
Look at the corefile from Babneet's grinder, the test seem to be stuck waiting on the last 2 vthread to complete. I've traced to the continuation object which was still alive at time of core generation.
The state of the continuation looks to have @LinHu2016 I looked through the code where this flag is set and one question that I have is while (VM_VMHelpers::isConcurrentlyScanned(returnContinuationState)) {
PUSH_OBJECT_IN_SPECIAL_FRAME(currentThread, continuationObject);
internalReleaseVMAccess(currentThread);
omrthread_monitor_enter(currentThread->publicFlagsMutex);
/* Wait for GC thread to notify us when it's done. */
omrthread_monitor_wait(currentThread->publicFlagsMutex);
omrthread_monitor_exit(currentThread->publicFlagsMutex);
internalAcquireVMAccess(currentThread); if the concurrent scan ended and notification is sent after the while check but before the wait on public flag, this would become a hang until some other event triggers a notify right? This would explain why the test passed after timeout signal is triggered? I assume this triggered some signal handling (after releasing EVMA) code which notified all thread hence unblocking. I wasn't able to get the native stack trace on my local machine so this is unconfirmed. will have to look at the core on the grinder machine to verify |
xlinux grinder -Xnocompressedrefs - passed 100 runs of TimedGet plinux grinder 100x xlinux grinder 200x |
We'll continue to investigate this for jdk20. |
Created #17181 for the assert. |
@fengxue-IS Nice catch - yes, this seems like a real problem. A dirty fix is to put a timed wait, but ATM I think a real fix would require a new public flag and atomically (using publicFlagMutex) set/check/reset it along with operations on Continuation state (setting/checking/reseting pending and concurrent scan bits). Will think more if this will work or if there are other solutions.... @LinHu2016 |
would it be sufficient to move the mutex acquire before while loop condition? |
considering that too... that would be simpler and more performant, too (would require some changes on notification side - to remove the scan flag under the mutex) |
Related: - eclipse-openj9#16728 - eclipse-openj9#17163 - eclipse-openj9#17181 - eclipse-openj9#17119 - eclipse-openj9#17120 - eclipse-openj9#16756 Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
Related: - eclipse-openj9#16728 - eclipse-openj9#17163 - eclipse-openj9#17181 - eclipse-openj9#17119 - eclipse-openj9#17120 - eclipse-openj9#16756 Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
TimedGet: eclipse-openj9/openj9#17163 TracePinnedThreads: eclipse-openj9/openj9#15936 ClassUnloading: eclipse-openj9/openj9#16053 HumongousStack: eclipse-openj9/openj9#15189 Signed-off-by: Gengchen Tuo <gengchen.tuo@ibm.com>
TimedGet: eclipse-openj9/openj9#17163 TracePinnedThreads: eclipse-openj9/openj9#15936 ClassUnloading: eclipse-openj9/openj9#16053 HumongousStack: eclipse-openj9/openj9#15189 Signed-off-by: Gengchen Tuo <gengchen.tuo@ibm.com>
Failure link
From an internal build(
paix822
):Rerun in Grinder - Change TARGET to run only the failed test targets.
Optional info
Failure output (captured from console output)
50x internal grinder - 5 failures
The text was updated successfully, but these errors were encountered: