Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8323807: Async UL: Add a stalling mode to async UL #22770

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jdksjolen
Copy link
Contributor

@jdksjolen jdksjolen commented Dec 16, 2024

Hi,

In January of this year I took a stab at implementing a stalling mode for UL, see link: #17757 . I also talked about this feature in the mailing lists and seemed to receive positive feedback. With that PR, I also implemented a circular buffer. This PR didn't go through because 1. The stalling mode was broken 2. The complexity was a bit too large imho.

This PR does a much smaller change by only focusing on implementing the actual stalling.

The addition in terms of command line changes are the same as before, you can now specify the mode of your async logging:

$ java -Xlog:async:drop # Dropping mode, same as today
$ java -Xlog:async:stall # Stalling mode!
$ java -Xlog:async # Dropping mode by default still

The change in protocol is quite simple. If a producer thread P cannot fit a message into the buffer, it mallocs a message and exposes it via a shared pointer. It blocks all other producer threads from writing into the buffer. At the same time, the consumer thread (AsyncLogWriter) will perform all writing. When the consumer thread has emptied the write buffer, it writes the stalled message, notifies P and releases all locks. P then let's all other producer threads continue.

We do this by having two locks: Outer and Inner. In our example above, P prevents any other producers from progressing by holding the outer lock, but allows the consumer thread to progress by releasing the inner lock.

In pseudo-code we have something like this in the stalling case.

void produce() {
  OuterLock olock;
  InnerLock ilock;
  bool out_of_memory = attempt_produce(shared_buffer);
  if (out_of_memory) {
    pmsg = new Message();
    shared_message = pmsg;
    while (shared_message != nullptr) ilock.wait();
    free(pmsg);
  }
}

void consume() {
  InnerLock ilock;
  consume(shared_buffer);
  if (shared_message != nullptr) {
    consume(shared_message);
    ilock.notify();
  }
}

Note! It is very important that the consumer prints all output found in the buffer before printing the stalled message. This is because logging is output in Program Order. In other words: print(m0); print(m1); means that m0 must appear before m1 in the log file.

Note! Yes, we do force all threads to stall before the original stalled message has been printed. This isn't optimal, but I still have hope that we can switch to a faster circular buffer in the future (actually, I've got that code too). If logs are written to a SSD with a ~700MiB/s sequential write speed, then this will be negligible if stalls are not very common.

Some benchmarks

I did some basic benchmarking with Spring Boot Hello World and performing some HTTP requests while doing a JFR recording.

I ran the following command for drop, similar for stall and sync:

$ /mvnw spring-boot:run -Dspring-boot.run.jvmArguments='-XX:StartFlightRecording=filename=/home/johan/dropping.jfr -Xlog:async:drop -Xlog:all=debug:file=droppinglog::filecount=0'

And I stressed the server by performing 100 000 requests over 100 concurrent connections with 4 threads participating.

The result speed wise is:

# Stalling
requests:      100000
good requests: 100000 [100%]
bad requests:  0 [0%]
seconds:       6.476
requests/sec:  15441.263

# Dropping
requests:      100000
good requests: 100000 [100%]
bad requests:  0 [0%]
seconds:       4.937
requests/sec:  20255.733

# Sync
requests:      100000
good requests: 100000 [100%]
bad requests:  0 [0%]
seconds:       27.899
requests/sec:  3584.328

The size of the stallinglog is 501MiB and the size of the droppinglog is 292MiB, so the amount written is quite different. Note: We're logging 501MiB over a period of <7 seconds, this is a huge amount of logging.

JFR measurements show the following results in JMC.

Stalling

image

Dropping

image

Sync

image

I believe that this form of stalling is sufficiently performant to serve as a middle-of-the-road choice for those who do not want to lose any log messages yet have fairly small safepoints.

Best regards,
Johan


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8323807: Async UL: Add a stalling mode to async UL (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22770/head:pull/22770
$ git checkout pull/22770

Update a local copy of the PR:
$ git checkout pull/22770
$ git pull https://git.openjdk.org/jdk.git pull/22770/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22770

View PR using the GUI difftool:
$ git pr show -t 22770

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22770.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 16, 2024

👋 Welcome back jsjolen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 16, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title 8323807 8323807: Async UL: Add a stalling mode to async UL Dec 16, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 16, 2024
@openjdk
Copy link

openjdk bot commented Dec 16, 2024

@jdksjolen The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Dec 16, 2024
@mlbridge
Copy link

mlbridge bot commented Dec 16, 2024

Webrevs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

1 participant