8361716 : GCTraceCPUTime may report incorrect times during high load from JNI code #28824

JonasNorlinder · 2025-12-15T12:50:55Z

GCTraceCPUTime is used by Serial, Parallel and G1 to estimate GC CPU time spent during safepoints. It estimates CPU time by sampling the CPU time for entire process by using os::getTimesSecs.

This will not be correct if native code from e.g. JNI is used and is incurring a high load on the machine as native code is not paused during safepoints.

A minimal example:

public static void main(String[] args) {
    // Initial GC that clean up start-up garbage etc.
    System.gc();

    for (int i = 1; i <= 128; i++) {
        addLoad(); // Creates 128 threads, that spins in a loop
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {}
        System.gc();
    }
    System.exit(0);
}

where addLoad() is a native call to a C++ program that create 128 pthreads that busy-spin forever with the largest nice value allowed.

In such scenario we can observe using G1

[0.026s][info][gc,cpu] GC(0) User=0.00s Sys=0.00s Real=0.01s
[2.103s][info][gc,cpu] GC(1) User=0.89s Sys=0.00s Real=0.03s
[4.129s][info][gc,cpu] GC(2) User=0.51s Sys=0.00s Real=0.01s
[6.195s][info][gc,cpu] GC(3) User=1.72s Sys=0.00s Real=0.06s
[8.244s][info][gc,cpu] GC(4) User=1.46s Sys=0.00s Real=0.05s
...
[201.191s][info][gc,cpu] GC(76) User=37.91s Sys=0.00s Real=1.19s
[205.064s][info][gc,cpu] GC(77) User=52.44s Sys=0.00s Real=1.64s
[208.068s][info][gc,cpu] GC(78) User=23.49s Sys=0.00s Real=0.74s
[211.795s][info][gc,cpu] GC(79) User=46.96s Sys=0.00s Real=1.47s

Since the Java code does not do anything the time should be close to 0.

Implementation Comment

Total CPU time for GC operations on the VM thread is obtained by cumulatively adding to a counter. Since system time is calculated (on Linux) with total - user is value contains a small error delta. If we continuously increment a value this error would grow to an unacceptable level. Therefore, the implementation will rely on the fact that no other VM operation may run during GC safepoint, making it safe to query the VM thread for it's detailed CPU time at the start and end of GC pause.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8361716: GCTraceCPUTime may report incorrect times during high load from JNI code (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28824/head:pull/28824
$ git checkout pull/28824

Update a local copy of the PR:
$ git checkout pull/28824
$ git pull https://git.openjdk.org/jdk.git pull/28824/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28824

View PR using the GUI difftool:
$ git pr show -t 28824

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28824.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-12-15T12:52:58Z

👋 Welcome back jnorlinder! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

JonasNorlinder · 2025-12-15T12:53:00Z

/label add hotspot-gc hotspot-runtime

openjdk · 2025-12-15T12:53:39Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-12-15T12:54:47Z

@JonasNorlinder
The hotspot-gc label was successfully added.

The hotspot-runtime label was successfully added.

openjdk · 2025-12-15T12:55:29Z

@JonasNorlinder The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

JonasNorlinder · 2025-12-15T12:56:59Z

/label remove hotspot

openjdk · 2025-12-15T12:59:29Z

@JonasNorlinder
The hotspot label was successfully removed.

mlbridge · 2025-12-15T14:56:21Z

Webrevs

mlbridge · 2025-12-15T21:53:24Z

Mailing list message from Kirk Pepperdine on hotspot-gc-dev:

Hi Jonas,

I believe the assumption of ?close to 0? is based on no other processes running on that machine, not just native JNI code. In a ?clean? system I have seen that these numbers are correct enough. The interesting thing about these numbers is the tell that they give you. For example, it appears as if you are running on a 32 core machine. Additionally, there are system level disturbances that are seen with an analysis of these measures. I?m sure enabling this analysis wasn?t intentional but it can be quite useful.

Kind regards,
Kirk

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1515 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20251215/3c29c5ce/smime.p7s>

src/hotspot/os/aix/os_aix.cpp

src/hotspot/os/bsd/os_bsd.cpp

tschatzl · 2025-12-16T09:34:41Z

src/hotspot/os/linux/os_linux.hpp

@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 1999, 2024, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 1999, 2025, Oracle and/or its affiliates. All rights reserved.


The copyright change seems to be the only change in this file, and should be reverted.

File was recently edited but didn't update copyright year. I can file a separate JBS and PR for that line if you prefer.

src/hotspot/os/windows/os_windows.cpp

src/hotspot/share/gc/shared/gcTraceTime.hpp

tschatzl · 2025-12-16T09:38:29Z

src/hotspot/share/runtime/os.hpp

+struct CPUTime {
+  jlong user;
+  jlong system;
+};


Fwiw, jlong is a "Java" long, so maybe some C type should be used here. I am aware that the OS interface traditionally uses it, but changes need to start somewhere. However the runtime team may have additional comments about this.

I think such change should start at the source i.e. os::thread_cpu_time, but will await what runtime team says...

src/hotspot/os/windows/os_windows.cpp

mlbridge · 2025-12-16T09:52:42Z

Mailing list message from Jonas Norlinder on hotspot-gc-dev:

Hi Kirk,

That's an interesting point you bring up. While I agree that the current implementation may leak information that is useful for detecting system saturation, I'm not sure that means we can keep this implementation. The semantic contract of the [gc, cpu] log tag is to report GC CPU time. By including process wide noise we violate that contract. We will still report system and user time just more accurately so.

For diagnosing system-level distances we have JFR (jdk.CPUload) and JMX (OperatingSystemMXBean). We should probably rely on these tools rather than using GC logs to carry a noisy signal.

Best,
Jonas

On 2025-12-15, 22:51, "Kirk Pepperdine" <kirk.pepperdine at gmail.com> wrote:

Hi Jonas,

I believe the assumption of ?close to 0? is based on no other processes running on that machine, not just native JNI code. In a ?clean? system I have seen that these numbers are correct enough. The interesting thing about these numbers is the tell that they give you. For example, it appears as if you are running on a 32 core machine. Additionally, there are system level disturbances that are seen with an analysis of these measures. I?m sure enabling this analysis wasn?t intentional but it can be quite useful.

Kind regards,
Kirk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20251216/c860cd5e/attachment.htm>

JonasNorlinder · 2025-12-16T13:57:51Z

Thanks for the review Thomas! Also fixed some comments from @kstefanj.

kstefanj

Looks good in general, some more comments.

kstefanj · 2025-12-16T14:38:23Z

src/hotspot/share/runtime/os.hpp

+struct cpu_time {
+  jlong user;
+  jlong system;
+};
+typedef struct cpu_time cpu_time_t;
+


I looked a bit closer at this now, I could make sense to use a more common Hotspot type here since this is nothing that we expect the JDK will use. I think that is the main reason we use jlong in os::thread_cpu_time().

The new detailed_thread_cpu_time() is a Hotspot API and could then use Hotspot types, like uint64_t. But then we likely want to change this in the users of this class as well, for example DetailedCPUTimeThreadClosure, and I'm not sure how big the fan-out would be. Maybe we can change the type as a follow-up if more people agree with this.

src/hotspot/share/gc/shared/gcTraceTime.hpp

src/hotspot/share/gc/shared/gcTraceTime.cpp

JonasNorlinder · 2025-12-16T16:21:52Z

Thanks for the review Stefan!

kstefanj

A couple of more comments.

src/hotspot/share/runtime/os.cpp

kstefanj · 2025-12-17T07:42:27Z

src/hotspot/share/services/cpuTimeUsage.cpp

+CPUTime CPUTimeUsage::GC::detailed_stringdedup() {
+  if (UseStringDeduplication) {
+    return detailed_thread_cpu_time_or_zero((Thread*)StringDedup::_processor->_thread);
+  }
+  return { 0, 0 };


I would have turned this around, but that's just taste, so skip if you prefer as is:

Suggested change

CPUTime CPUTimeUsage::GC::detailed_stringdedup() {

if (UseStringDeduplication) {

return detailed_thread_cpu_time_or_zero((Thread*)StringDedup::_processor->_thread);

}

return { 0, 0 };

CPUTime CPUTimeUsage::GC::detailed_stringdedup() {

if (!UseStringDeduplication) {

return {0, 0};

}

return detailed_thread_cpu_time_or_zero((Thread*)StringDedup::_processor->_thread);

tschatzl · 2025-12-17T11:43:15Z

src/hotspot/os/linux/os_linux.hpp

@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 1999, 2024, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 1999, 2025, Oracle and/or its affiliates. All rights reserved.


tschatzl · 2025-12-17T11:47:44Z

src/hotspot/share/runtime/os.hpp

+class CPUTime {
+public:
+  int64_t user;
+  int64_t system;


If everything is public in this class, consider using a struct instead.

Also per style guidelines, even public class/struct members should use the underscore prefix.

JonasNorlinder · 2025-12-18T09:05:44Z

Huge thanks to @kstefanj that found that values could appear as <0 where GC runs very frequent. I tracked the error down due to the technical limitations of how we can get system time on Linux for a thread. Since it requires two calls and taking the difference (total - user), there is a risk that these are out of sync. This can be resolved by using getrusage which values are in sync.

Fix JNI bug in GCTraceCPUTime

588e1b3

openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org labels Dec 15, 2025

openjdk bot added the hotspot hotspot-dev@openjdk.org label Dec 15, 2025

openjdk bot removed the hotspot hotspot-dev@openjdk.org label Dec 15, 2025

JonasNorlinder marked this pull request as ready for review December 15, 2025 14:48

openjdk bot added the rfr Pull request is ready for review label Dec 15, 2025

tschatzl suggested changes Dec 16, 2025

View reviewed changes

Fix review comments

d3692c7

kstefanj suggested changes Dec 16, 2025

View reviewed changes

Fix feedback etc

c7d9bc8

JonasNorlinder added 2 commits December 16, 2025 17:22

Fix indentation

2106f28

Use portable type

456ec1d

kstefanj reviewed Dec 17, 2025

View reviewed changes

Overload += too

fa31b86

tschatzl reviewed Dec 17, 2025

View reviewed changes

Address accuracy in high freqency sampling on Linux

8685384

openjdk bot removed the rfr Pull request is ready for review label Dec 18, 2025

Remove trailing whitespace

a4af804

openjdk bot added the rfr Pull request is ready for review label Dec 18, 2025

8361716 : GCTraceCPUTime may report incorrect times during high load from JNI code #28824

Are you sure you want to change the base?

8361716 : GCTraceCPUTime may report incorrect times during high load from JNI code #28824

Uh oh!

Conversation

JonasNorlinder commented Dec 15, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Comment

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Dec 15, 2025

Uh oh!

JonasNorlinder commented Dec 15, 2025

Uh oh!

openjdk bot commented Dec 15, 2025

Uh oh!

openjdk bot commented Dec 15, 2025

Uh oh!

openjdk bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JonasNorlinder commented Dec 15, 2025

Uh oh!

openjdk bot commented Dec 15, 2025

Uh oh!

mlbridge bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

mlbridge bot commented Dec 15, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mlbridge bot commented Dec 16, 2025

Uh oh!

JonasNorlinder commented Dec 16, 2025

Uh oh!

kstefanj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JonasNorlinder commented Dec 16, 2025

Uh oh!

kstefanj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JonasNorlinder commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

JonasNorlinder commented Dec 15, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Dec 15, 2025 •

edited

Loading

mlbridge bot commented Dec 15, 2025 •

edited

Loading

JonasNorlinder commented Dec 18, 2025 •

edited

Loading