Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8212107: VMThread issues and cleanup #228

Closed
wants to merge 19 commits into from

Conversation

robehn
Copy link
Contributor

@robehn robehn commented Sep 17, 2020

We simplify the vmThread by removing the queue and refactor the the main loop.
This solves the issues listed:

  • It can create an extra safepoint directly after a safepoint.
  • It's not safe for a non-JavaThread to add safepoint to queue while GC do oops do.
  • The exposure of the vm operation is dangerous if it's a handshake.
  • The code is a hornets nest with the repetition of checks and branches

Passes t1-8, and a benchmark run.

If you want a smaller diff the commits contains the incremental progress and each commit passed t1.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/228/head:pull/228
$ git checkout pull/228

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 17, 2020

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 17, 2020

@robehn The following label will be automatically applied to this pull request: hotspot.

When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label (add|remove) "label" command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Sep 17, 2020
@robehn robehn marked this pull request as ready for review September 17, 2020 20:30
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 17, 2020
@mlbridge
Copy link

mlbridge bot commented Sep 17, 2020

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find juggling the _next_vm_operation a bit confusing at the first glance, but that seems superficially okay.

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should have waited to review this after all of Aleksey's
comments were resolved. I'm gonna have to take a look at
src/hotspot/share/runtime/vmThread.cpp again via a webrev;
it's just too hard to review via this snippet UI.

I'll re-review after all of Aleksey's changes are done.

@mlbridge
Copy link

mlbridge bot commented Sep 21, 2020

Mailing list message from David Holmes on hotspot-dev:

Hi Robbin,

On 18/09/2020 6:34 am, Robbin Ehn wrote:

We simplify the vmThread by removing the queue and refactor the the main loop.

Can you explain why it was necessary to remove the queue and exactly
what it has been replaced with? I'd like to understand the new
higher-level design for VMOperation execution rather than trying to
reverse engineer it from the code changes.

Thanks,
David
-----

@robehn
Copy link
Contributor Author

robehn commented Sep 21, 2020

Can you explain why it was necessary to remove the queue and exactly
what it has been replaced with? I'd like to understand the new
higher-level design for VMOperation execution rather than trying to
reverse engineer it from the code changes.

VM operations now rare and when we do them they are now also faster compared to when the queue was introduced.
(I believe way back the VM thread did all compiles in no-safepoint op, safepoint was on the higher prio vs lower prio non-safepoint ops)
During a normal execution we do handshakes and safepoints. The handshakes we default do used to be safepoint, there is no reason to threat them with a lower prio.
We reach the safepoint much faster nowadays, which means there is very little time to add anything to a queue.
And to reach safepoint faster it would be better to stop for the safepoint than adding anything to a queue before stopping.

So we replace the queue with a "next safepoint operation".
Any other safepoint requester will have their operation on their stack until they succeed to set it as "next safepoint operation".

Thanks,
David

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice cleanup. I had a couple of questions.

@openjdk
Copy link

openjdk bot commented Sep 22, 2020

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for more details.

After integration, the commit message for the final commit will be:

8212107: VMThread issues and cleanup

Reviewed-by: shade, dcubed, coleenp, dholmes, redestad

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 28 new commits pushed to the master branch:

  • d25b03e: 8253616: Change to GCC 10.2 for building on Linux at Oracle
  • 821bd08: 8253667: ProblemList tools/jlink/JLinkReproducible{,3}Test.java on linux-aarch64
  • 1ae6b53: 8252194: Add automated test for fix done in JDK-8218469
  • 77a0f39: 8253540: InterpreterRuntime::monitorexit should be a JRT_LEAF function
  • 0054c15: 8253435: Cgroup: 'stomping of _mount_path' crash if manually mounted cpusets exist
  • 8e338f6: 8253646: ZGC: Avoid overhead of sorting ZStatIterableValues on bootstrap
  • ec9bee6: 8253015: Aarch64: Move linux code out from generic CPU feature detection
  • 16b8c39: 8253053: Javadoc clean up in Authenticator and BasicAuthenicator
  • 840aa2b: 8253424: Add support for running pre-submit testing using GitHub Actions
  • 8e87d46: 8252857: AArch64: Shenandoah C1 CAS is not sequentially consistent
  • ... and 18 more: https://git.openjdk.java.net/jdk/compare/1f5a033421bbcf803169c5c5f93314fd22b5a4a5...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 22, 2020
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good. Mapping between the old way and the new way is a little tricky but I think I made all the connections.
One thing I did notice is that it seems that nested VM operations are now restricted to a nesting depth of one - is that correct? (And the code could be a lot simpler if nesting was not needed. :) ).
A couple of minor comments/suggestions below.
Thanks.
David

@robehn
Copy link
Contributor Author

robehn commented Sep 23, 2020

This generally looks good. Mapping between the old way and the new way is a little tricky but I think I made all the connections.
One thing I did notice is that it seems that nested VM operations are now restricted to a nesting depth of one - is that correct? (And the code could be a lot simpler if nesting was not needed. :) ).
A couple of minor comments/suggestions below.
Thanks.
David

Hi, David.

The support should be the same as before the previous operation is stored on stack while we change to the nested operation.
I don't see what would be different from before?

Thanks, Robbin

@dholmes-ora
Copy link
Member

Hi, David.

The support should be the same as before the previous operation is stored on stack while we change to the nested operation.
I don't see what would be different from before?

Sorry my mistake.

Thanks.

@robehn robehn requested a review from shipilev September 23, 2020 11:27
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only minor comments, without diving into the logic machinery. I am relying on others to review this more substantially.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@robehn robehn requested a review from dcubed-ojdk September 23, 2020 15:27
@mlbridge
Copy link

mlbridge bot commented Sep 24, 2020

Mailing list message from David Holmes on hotspot-dev:

On 23/09/2020 9:27 pm, Robbin Ehn wrote:

On Wed, 23 Sep 2020 11:02:54 GMT, David Holmes <dholmes at openjdk.org> wrote:

inner_execute(..) is called in the non nested-case here:
https://github.com/openjdk/jdk/blob/e49178a4445378fa0b5505ad6e9f1661636f88b8/src/hotspot/share/runtime/vmThread.cpp#L474

Nested case:
https://github.com/openjdk/jdk/blob/e49178a4445378fa0b5505ad6e9f1661636f88b8/src/hotspot/share/runtime/vmThread.cpp#L511

Sorry I missed that. Seems odd that inner_execute handles nesting when that is only possible via one of the paths by
which it is called - that's why I thought it was only for the case where called from execute(). I'd rather see the
nesting logic handled as before, exclusively on the code path in which it can occur.

That would create a lot of code duplication:
void VMThread::none_nested_inner_execute(VM_Operation* op) {
Thread* current = Thread::current();
assert(current->is_VM_thread(), "must be a VM thread");

_cur_vm_operation = op;

HandleMark hm(VMThread::vm_thread());
EventMark em("Executing %s VM operation: %s", op->name());

// If we are at a safepoint we will evaluate all the operations that
// follow that also require a safepoint
log_debug(vmthread)("Evaluating %s %s VM operation: %s",
_cur_vm_operation->evaluate_at_safepoint() ? "safepoint" : "non-safepoint",
_cur_vm_operation->name());

bool end_safepoint = false;
if (_cur_vm_operation->evaluate_at_safepoint()) {
SafepointSynchronize::begin();
if (_timeout_task != NULL) {
_timeout_task->arm();
}
end_safepoint = true;
}

evaluate_operation(_cur_vm_operation);

if (end_safepoint) {
if (_timeout_task != NULL) {
_timeout_task->disarm();
}
SafepointSynchronize::end();
}

_cur_vm_operation = NULL;
}
Which 80% the same. (Same minus a few lines)

I envisaged simply moving the nesting check out of inner_execute and
back into execute:

// psuedo-code
execute(VM_Operation* op) {
if (on VMThread) {
if (_cur_operation != NULL) {
// nested case
check_nesting_allowed();
VM_Operation* prev = _cur_operation;
_cur_operation = NULL;
inner_execute(op);
_cur_operation = prev;
}
}

Cheers,
David

@dcubed-ojdk
Copy link
Member

I'm looking at vmThread.cpp via the webrev and the "next" button
on the frames view has stopped working after change number 8:

https://openjdk.github.io/cr/?repo=jdk&pr=228&range=05#frames-6

The "Scroll Down" button is working so I'll push thru it...

// Check for a cleanup before SafepointALot to keep stats correct.
long interval_ms = SafepointTracing::time_since_last_safepoint_ms();
bool max_time_exceeded = GuaranteedSafepointInterval != 0 &&
(interval_ms >= GuaranteedSafepointInterval);
if (max_time_exceeded && SafepointSynchronize::is_cleanup_needed()) {
return &cleanup_op;
if (!max_time_exceeded) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've changed the meaning of SafepointALot here. If max_time_exceeded
is false, then you never check the SafepointALot flag and miss causing a
safepointALot_op to happen next.

Here's the old code:

394 if (max_time_exceeded && SafepointSynchronize::is_cleanup_needed()) {
395 return &cleanup_op;
396 }
397 if (SafepointALot) {
398 return &safepointALot_op;
399 }

In the old code if max_time_exceeded and we need a cleanup,
then cleanup_op is the priority, but if that wasn't the case, then
we always checked the SafepointALot flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior could create a SafepointALot when we had no 'safepoint priority' ops in queue when woken.
To get this behavior we need more logic to avoid back to back SafepointALot and we need to peek at _next_vm_operation to determine if it's a safepoint op or not (handshake).

During a normal test run the old behavior only creates around 1% more safepoints.
And if you want more safepoints you can decrease GuaranteedSafepointInterval (not exactly the same).

So I didn't think adding that complexity to exactly mimic the old behavior was worth it.

What you want me to do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... The old SafepointALot was intended to safepoint as frequently
as possible to stress the system. Now we do very little at safepoints so
maybe it is time for SafepointALot to evolve. Can you make it so that a
SafepointALot happens some fraction of GuaranteedSafepointInterval, e.g.,
(GuaranteedSafepointInterval / 4) so four times as often?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All test using SafepointALot the already set the GuaranteedSafepointInterval to a low value in range of ~1-300ms.
(except for vm boolean flag test which uses SafepointALot to test a boolean flag)
For example jni/FastGetField sets GuaranteedSafepointInterval to 1.

The only case it would really differ is when adhoc adding SafepointALot without GuaranteedSafepointInterval.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If GuaranteedSafepointInterval is set to a lower value than the default on the command line, then I'm okay if SafepointALot does not do anything extra. However, if GuaranteedSafepointInterval is either the default value or is set to a higher value, then I would like SafepointALot to cause a safepoint more frequently than the GuaranteedSafepointInterval. Every GuaranteedSafepointInterval/4 would be a fine definition of "a lot".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mulling on this more... is it too radical to consider that we no longer need SafepointALot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like SafepointALot(and HandshakeALot) to be executed in a separate thread and that randomly request a safepoint (preferably with some validation inside the operation).
Since VM thread now handles this it can not do this request while busy.
Also having the VM thread 'more' sporadic waking up will be confusing for the VM thread loop.

So I agree with you that we need a better SafepointALot, but I think it wrong to use the VM thread to drive it.
I suggest we create an enhancement for it.

@dcubed-ojdk
Copy link
Member

Most of my comments this round are not critical. The only real issue
that I found was the change in behavior for the SafepointALot flag.
The refactoring will make future code maintenance much, much easier,
but it made reviewing vmThread.cpp an adventure.

@mlbridge
Copy link

mlbridge bot commented Sep 25, 2020

Mailing list message from David Holmes on hotspot-dev:

<trimming>

Hi Dan,

On 25/09/2020 6:39 am, Daniel D.Daugherty wrote:

On Thu, 24 Sep 2020 06:27:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

352: // Wait to install this operation as the next operation in the VM Thread
353: log_trace(vmthread)("A VM operation already set, waiting");
354: ml.wait();

So instead of a thread enqueuing an operation on the VMop queue
and then waiting for the operation to be executed, we have the thread
waiting to enqueue the operation as the "next operation". It seems to
me that the new algorithm means that the waiting thread will be
woken up more often and then go back to wait()ing without progress.
Perhaps this is mitigated by there being way fewer VM operations in
the system, but I'm not sure.

This is the whole premise of making this change: we no longer need a
queue because we rarely have >1 VM-operations in-flight. So the
expectation with the new "distributed queue" is that at most one or two
threads may be waiting.

Cheers,
David
-----

@dcubed-ojdk
Copy link
Member

@dholmes-ora and @robehn - I'm good with the rationale about
why we have gotten rid of the VM op queue. My comment above
it mostly just mumbling about it to myself while I think it through...

@@ -58,7 +58,7 @@ extern Monitor* CodeSweeper_lock; // a lock used by the sweeper o
extern Mutex* MethodData_lock; // a lock on installation of method data
extern Mutex* TouchedMethodLog_lock; // a lock on allocation of LogExecutedMethods info
extern Mutex* RetData_lock; // a lock on installation of RetData inside method data
extern Monitor* VMOperationQueue_lock; // a lock on queue of vm_operations waiting to execute
extern Monitor* VMOperation_lock; // a lock on queue of vm_operations waiting to execute
extern Monitor* VMOperationRequest_lock; // a lock on Threads waiting for a vm_operation to terminate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the declaration of VMOperationRequest_lock be removed now too? Since it's no longer being defined in mutexLocker.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing, pushing later.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM.

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with leaving SafepointALot as you have it now
and leaving any future cleanup/refinement to a new RFE.

@robehn
Copy link
Contributor Author

robehn commented Sep 29, 2020

Thanks all!

/integrate

@openjdk openjdk bot closed this Sep 29, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 29, 2020
@openjdk
Copy link

openjdk bot commented Sep 29, 2020

@robehn Since your change was applied there have been 37 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit 431338b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@robehn robehn deleted the 8212107-vmthread branch September 29, 2020 09:46
fg1417 pushed a commit to fg1417/jdk that referenced this pull request Aug 17, 2022
After JDK-8283091, the loop below can be vectorized partially.
Statement 1 can be vectorized but statement 2 can't.
```
// int[] iArr; long[] lArrFld; int i1,i2;
for (i1 = 6; i1 < 227; i1++) {
  iArr[i1] += lArrFld[i1]++; // statement 1
  iArr[i1 + 1] -= (i2++); // statement 2
}
```

But we got incorrect results because the vector packs of iArr are
scheduled incorrectly like:
```
...
load_vector XMM1,[R8 + openjdk#16 + R11 << openjdk#2]
movl    RDI, [R8 + openjdk#20 + R11 << openjdk#2] # int
load_vector XMM2,[R9 + openjdk#8 + R11 << openjdk#3]
subl    RDI, R11    # int
vpaddq  XMM3,XMM2,XMM0  ! add packedL
store_vector [R9 + openjdk#8 + R11 << openjdk#3],XMM3
vector_cast_l2x  XMM2,XMM2  !
vpaddd  XMM1,XMM2,XMM1  ! add packedI
addl    RDI, openjdk#228   # int
movl    [R8 + openjdk#20 + R11 << openjdk#2], RDI # int
movl    RBX, [R8 + openjdk#24 + R11 << openjdk#2] # int
subl    RBX, R11    # int
addl    RBX, openjdk#227   # int
movl    [R8 + openjdk#24 + R11 << openjdk#2], RBX # int
...
movl    RBX, [R8 + openjdk#40 + R11 << openjdk#2] # int
subl    RBX, R11    # int
addl    RBX, openjdk#223   # int
movl    [R8 + openjdk#40 + R11 << openjdk#2], RBX # int
movl    RDI, [R8 + openjdk#44 + R11 << openjdk#2] # int
subl    RDI, R11    # int
addl    RDI, openjdk#222   # int
movl    [R8 + openjdk#44 + R11 << openjdk#2], RDI # int
store_vector [R8 + openjdk#16 + R11 << openjdk#2],XMM1
...
```
simplified as:
```
load_vector iArr in statement 1
unvectorized loads/stores in statement 2
store_vector iArr in statement 1
```
We cannot pick the memory state from the first load for LoadI pack
here, as the LoadI vector operation must load the new values in memory
after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2).
We must take the memory state of the last load where we have assigned
new values ('iArr[i1 + 1] - (i2++)') to the iArr array.

In JDK-8240281, we picked the memory state of the first load. Different
from the scenario in JDK-8240281, the store, which is dependent on an
earlier load here, is in a pack to be scheduled and the LoadI pack
depends on the last_mem. As designed[2], to schedule the StoreI pack,
all memory operations in another single pack should be moved in the same
direction. We know that the store in the pack depends on one of loads in
the LoadI pack, so the LoadI pack should be scheduled before the StoreI
pack. And the LoadI pack depends on the last_mem, so the last_mem must
be scheduled before the LoadI pack and also before the store pack.
Therefore, we need to take the memory state of the last load for the
LoadI pack here.

To fix it, the pack adds additional checks while picking the memory state
of the first load. When the store locates in a pack and the load pack
relies on the last_mem, we shouldn't choose the memory state of the
first load but choose the memory state of the last load.

[1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
[2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232

Jira: ENTLLT-5482
Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8
CustomizedGitHooks: yes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants