Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid double scan stacks in continuation for compact cases #16022

Merged
merged 1 commit into from
Oct 5, 2022

Conversation

LinHu2016
Copy link
Contributor

When virtual thread mounts to J9VMThread, the GC might scan java stack of the related continuation twice (one from
J9VMThread.currentContinuation during thread scanning, another one from the related live continuation Object).
Scan twice the same java stack during compaction fix-up might cause wrong double fixup and we also might need to both scan for concurrent case(there is no issue for double scan in concurrent collectors).
So we stop to scan the stack of the continuation Object, which is
mounted to the J9VMThread for STW collectors.

Signed-off-by: Lin Hu linhu@ca.ibm.com

@LinHu2016 LinHu2016 changed the title Avoid double scan stacks in continuation for compact case Avoid double scan stacks in continuation for compact cases Sep 30, 2022
@LinHu2016
Copy link
Contributor Author

@LinHu2016
Copy link
Contributor Author

@amicic and @dmitripivkine please review the changes, thanks

@@ -2058,12 +2058,12 @@ class VM_VMHelpers
}

static VMINLINE bool
needScanStacksForContinuation(J9VMThread *vmThread, j9object_t continuationObject) {
needScanStacksForContinuation(J9VMThread *vmThread, j9object_t continuationObject, bool bSTWScan = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this parameter to scanOnly(Dis?)Mounted. Callers will know when it's important to take that into consideration (and pass true/false accordingly). STW has no meaning in this context.

@amicic
Copy link
Contributor

amicic commented Oct 3, 2022

@tajila or @babsingh or @fengxue-IS, please confirm that we don't have an existing way to examine if Virtual thread is currently mounted or not (given a reference to continuation object) and that bMount flag in the struct is a reasonable way to go.

Also, do you care if flag toggling is done on VM side, rather than in GC callback? I think I slightly prefer it on VM side, for example, just after/before swapFieldsWithContinuation is called.

@@ -65,6 +70,12 @@ postDismountContinuation(J9VMThread *vmThread, j9object_t object)
/* Conservatively assume that via mutations of stack slots (which are not subject to access barriers),
* all post-write barriers have been triggered on this Continuation object, since it's been mounted. */
vmThread->javaVM->memoryManagerFunctions->J9WriteBarrierBatch(vmThread, object);
#if JAVA_SPEC_VERSION >= 19
J9VMContinuation *continuation = J9VMJDKINTERNALVMCONTINUATION_VMREF(vmThread, object);
Copy link
Contributor

@amicic amicic Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currious, under which circumstances this pointer could be NULL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this case continuation could not be NULL, change it as Assert_MM_true(NULL != continuation);

Copy link
Contributor

@babsingh babsingh Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the VM side, JVM_VirtualThreadUnmountEnd will set it to NULL during the last unmount of a virtual thread. During the last unmount, the virtual thread state is set to TERMINATED. RI collects such virtual threads. RI also collects virtual threads if their state is NEW. Currently, we set CONTINUATION_VMREF for a virtual thread from its constructor when its state is NEW. Will this prevent a virtual thread in state NEW to be collected? If so, CONTINUATION_VMREF should be set during the first call to Continuation.enterImpl. @fengxue-IS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, the vmRef field of a continuation is set during constructor code (createContinuation), so even if a new continuation that never gets executed will still have the field set.

@@ -73,7 +73,7 @@ MM_CompactSchemeFixupObject::fixupContinuationObject(MM_EnvironmentStandard *env
fixupMixedObject(objectPtr);
/* fixup Java Stacks in J9VMContinuation */
J9VMThread *currentThread = (J9VMThread *)env->getLanguageVMThread();
if (VM_VMHelpers::needScanStacksForContinuation(currentThread, objectPtr)) {
if (VM_VMHelpers::needScanStacksForContinuation(currentThread, objectPtr, true)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a comment here, something like:
In sliding compaction we must fix slots exactly once. Since we will fixup stack slots of mounted Virtual threads later during root fixup, we will skip it during this heap fixup pass (hence passing true(false) for scanOnly(Dis)Mounted parameter)

@amicic amicic added comp:gc project:loom Used to track Project Loom related work labels Oct 3, 2022
@@ -2058,12 +2058,12 @@ class VM_VMHelpers
}

static VMINLINE bool
needScanStacksForContinuation(J9VMThread *vmThread, j9object_t continuationObject) {
needScanStacksForContinuation(J9VMThread *vmThread, j9object_t continuationObject, bool scanOnlyDisMounted = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower case 'M' to be consistent with pre/postMountContinuation API

@@ -58,7 +58,9 @@ void
MM_HeapWalkerDelegate::doContinuationObject(MM_EnvironmentBase *env, omrobjectptr_t objectPtr, MM_HeapWalkerSlotFunc function, void *userData)
{
J9VMThread *currentThread = (J9VMThread *)env->getLanguageVMThread();
if (VM_VMHelpers::needScanStacksForContinuation(currentThread, objectPtr)) {
/* In sliding compaction we must fix slots exactly once. Since we will fixup stack slots of mounted Virtual threads later during root fixup,
Copy link
Contributor

@amicic amicic Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minimum, the comment should be adjusted, since HeapWalker is a tool that can be used outside of sliding compaction/movement fixup.

But more importantly, why is this fix needed for heap walker!? Heap walker does not walk over roots (as it does in compact fixup).
Could it walk over the same object twice via Remember Set walk? But if it is so, that has no relevance with thread being mounted or not....

Am I missing something else here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, we don't need scan only once protection here.

@babsingh
Copy link
Contributor

babsingh commented Oct 3, 2022

please confirm that we don't have an existing way to examine if Virtual thread is currently mounted or not (given a reference to continuation object) and that bMount flag in the struct is a reasonable way to go.

#15183 (comment) specifies an approach for VThreadContinuation. Now, continuations can be independent and not be associated to a virtual thread. IMO, the bMount flag in the struct is a reasonable way.

do you care if flag toggling is done on VM side, rather than in GC callback? I think I slightly prefer it on VM side, for example, just after/before swapFieldsWithContinuation is called.

It should be toggled as close to swapFieldsWithContinuation (in the same method) to avoid an inconsistent state in case the virtual thread is suspended.

@tajila @fengxue-IS for confirmation.

@amicic
Copy link
Contributor

amicic commented Oct 3, 2022

I can see that GC is not consistent with VM terminology (and from what I can quickly see, JEP too) in use of dismount vs unmount. Since we are already touching most of the files where Dismount is used, we should fix this now and use Unmount, instead.

@fengxue-IS
Copy link
Contributor

#15183 (comment) specifies an approach for VThreadContinuation. Now, continuations can be independent and not be associated to a virtual thread. IMO, the bMount flag in the struct is a reasonable way.

+1
Note here that VM's definition of mount/unmount can different from GC as GC only cares which stack object continuation->stackObject is pointing to). Given such condition, I prefer if we don't use the lambda hacks to process additional data even if continuation isn't independent (future JEP structured concurrency may also have this problem as it may use continuation separately of virtualthread).

do you care if flag toggling is done on VM side, rather than in GC callback? I think I slightly prefer it on VM side, for example, just after/before swapFieldsWithContinuation is called.

It should be toggled as close to swapFieldsWithContinuation (in the same method) to avoid an inconsistent state in case the virtual thread is suspended.

As long as the toggle is done inside the INL where VMAccess is acquired, I don't have any problems.

@babsingh
Copy link
Contributor

babsingh commented Oct 3, 2022

bMount flag in the struct is a reasonable way to go.

Instead of a flag, we should store the J9VMThread on which the continuation is mounted. Otherwise, it should be NULL. This approach will allow us to simplify our implementation at some locations.

typedef struct J9VMContinuation {
    ...
    J9VMThread *carrierThread;
}

enterContinuation {
    ...
    VM_ContinuationHelpers::swapFieldsWithContinuation(currentThread, continuation);
    continuation->carrierThread = currentThread;
    ...
}

yieldContinuation {
    ...
    VM_ContinuationHelpers::swapFieldsWithContinuation(currentThread, continuation);
    continuation->carrierThread = NULL;
    ...
}

isMounted {
   return (NULL != continuation->carrierThread);
}

@amicic
Copy link
Contributor

amicic commented Oct 3, 2022

Instead of a flag, we should store the J9VMThread on which the continuation is mounted. Otherwise, it should be NULL. This approach will allow us to simplify our implementation at some locations

Makes sense. Useful for debugging too.

@LinHu2016 LinHu2016 force-pushed the avoiddoublescan branch 2 times, most recently from 5c85306 to ceaada8 Compare October 4, 2022 14:31
@amicic amicic requested a review from babsingh October 4, 2022 14:51
@@ -4998,6 +4998,7 @@ typedef struct J9VMContinuation {
UDATA* j2iFrame;
struct J9JITGPRSpillArea jitGPRs;
struct J9VMEntryLocalStorage* oldEntryLocalStorage;
J9VMThread *carrierThread;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this line is a copy-paste from a discussion, but in this file '*' is kept next to the type (rather then next to the field) name.

@LinHu2016 LinHu2016 force-pushed the avoiddoublescan branch 3 times, most recently from df7ed2b to b35d48e Compare October 4, 2022 17:37
@@ -197,6 +197,7 @@ class MM_GCExtensions : public MM_GCExtensionsBase {
UDATA freeSizeThresholdForSurvivor; /**< if average freeSize(freeSize/freeCount) of the region is smaller than the Threshold, the region would not be reused by collector as survivor, for balanced GC only */
bool recycleRemainders; /**< true if need to recycle TLHRemainders at the end of PGC, for balanced GC only */

bool disableConcurrentSupportForContinuation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add annotation to this variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't want this debug flag into official builds. Please remove it from this PR, and perform all the testing relevant to it in local test environment.

@amicic
Copy link
Contributor

amicic commented Oct 4, 2022

@babsingh not sure if you are distracted with other work, but did you have any concerns, or I can start PR testing?

@LinHu2016
Copy link
Contributor Author

Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally, LGTM. Minor documentation and formatting nitpicks.

runtime/oti/VMHelpers.hpp Outdated Show resolved Hide resolved
runtime/gc_vlhgc/WriteOnceCompactor.cpp Outdated Show resolved Hide resolved
runtime/oti/VMHelpers.hpp Outdated Show resolved Hide resolved
@babsingh
Copy link
Contributor

babsingh commented Oct 4, 2022

not sure if you are distracted with other work, but did you have any concerns, or I can start PR testing?

GTG. Please proceed with PR testing once the above feedback is addressed.

When virtual thread mounts to J9VMThread, the GC might scan java stack
of the related continuation twice (one from
J9VMThread.currentContinuation during thread scanning, another one from
the related live continuation Object).
Scan twice the same java stack during compaction fix-up might cause
wrong double fixup and we also might need to both scan for concurrent
 case(there is no issue for double scan in concurrent collectors).
 so we stop to scan the stack of the continuation Object, which is
mounted to the J9VMThread for STW collectors.

Signed-off-by: Lin Hu <linhu@ca.ibm.com>
@amicic
Copy link
Contributor

amicic commented Oct 4, 2022

Jenkins sanity test all jdk19

@babsingh
Copy link
Contributor

babsingh commented Oct 4, 2022

Fixed typo.
jenkins test sanity all jdk19

@amicic
Copy link
Contributor

amicic commented Oct 4, 2022

Jenkins test sanity all jdk19

@amicic
Copy link
Contributor

amicic commented Oct 4, 2022

Jenkins compile win jdk8

@amicic
Copy link
Contributor

amicic commented Oct 4, 2022

Yeah, that was a close race. We launched it within a same minute. Good that you cancelled one. Thanks!

@amicic
Copy link
Contributor

amicic commented Oct 5, 2022

The check report is confused and still showing some active jobs, while they are actually aborted. But all that are not aborted are green, as can be seen from here:

https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/2810/
https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/2811/

@amicic amicic merged commit 52ab41f into eclipse-openj9:master Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gc project:loom Used to track Project Loom related work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants