8323807: Async UL: Add a stalling mode to async UL #22770

jdksjolen · 2024-12-16T17:11:13Z

Hi,

In January of this year I took a stab at implementing a stalling mode for UL, see link: #17757 . I also talked about this feature in the mailing lists and seemed to receive positive feedback. With that PR, I also implemented a circular buffer. This PR didn't go through because 1. The stalling mode was broken 2. The complexity was a bit too large imho.

This PR does a much smaller change by only focusing on implementing the actual stalling.

The addition in terms of command line changes are the same as before, you can now specify the mode of your async logging:

$ java -Xlog:async:drop # Dropping mode, same as today
$ java -Xlog:async:stall # Stalling mode!
$ java -Xlog:async # Dropping mode by default still

The change in protocol is quite simple. If a producer thread P cannot fit a message into the buffer, it mallocs a message and exposes it via a shared pointer. It blocks all other producer threads from writing into the buffer. At the same time, the consumer thread (AsyncLogWriter) will perform all writing. When the consumer thread has emptied the write buffer, it writes the stalled message, notifies P and releases all locks. P then let's all other producer threads continue.

We do this by having two locks: Outer and Inner. In our example above, P prevents any other producers from progressing by holding the outer lock, but allows the consumer thread to progress by releasing the inner lock.

In pseudo-code we have something like this in the stalling case.

void produce() {
  OuterLock olock;
  InnerLock ilock;
  bool out_of_memory = attempt_produce(shared_buffer);
  if (out_of_memory) {
    pmsg = new Message();
    shared_message = pmsg;
    while (shared_message != nullptr) ilock.wait();
    free(pmsg);
  }
}

void consume() {
  InnerLock ilock;
  consume(shared_buffer);
  if (shared_message != nullptr) {
    consume(shared_message);
    ilock.notify();
  }
}

Note! It is very important that the consumer prints all output found in the buffer before printing the stalled message. This is because logging is output in Program Order. In other words: print(m0); print(m1); means that m0 must appear before m1 in the log file.

Note! Yes, we do force all threads to stall before the original stalled message has been printed. This isn't optimal, but I still have hope that we can switch to a faster circular buffer in the future (actually, I've got that code too). If logs are written to a SSD with a ~700MiB/s sequential write speed, then this will be negligible if stalls are not very common.

Some benchmarks

I did some basic benchmarking with Spring Boot Hello World and performing some HTTP requests while doing a JFR recording.

I ran the following command for drop, similar for stall and sync:

$ /mvnw spring-boot:run -Dspring-boot.run.jvmArguments='-XX:StartFlightRecording=filename=/home/johan/dropping.jfr -Xlog:async:drop -Xlog:all=debug:file=droppinglog::filecount=0'

And I stressed the server by performing 100 000 requests over 100 concurrent connections with 4 threads participating.

The result speed wise is:

# Stalling
requests:      100000
good requests: 100000 [100%]
bad requests:  0 [0%]
seconds:       6.476
requests/sec:  15441.263

# Dropping
requests:      100000
good requests: 100000 [100%]
bad requests:  0 [0%]
seconds:       4.937
requests/sec:  20255.733

# Sync
requests:      100000
good requests: 100000 [100%]
bad requests:  0 [0%]
seconds:       27.899
requests/sec:  3584.328

The size of the stallinglog is 501MiB and the size of the droppinglog is 292MiB, so the amount written is quite different. Note: We're logging 501MiB over a period of <7 seconds, this is a huge amount of logging.

JFR measurements show the following results in JMC.

Stalling

Dropping

Sync

I believe that this form of stalling is sufficiently performant to serve as a middle-of-the-road choice for those who do not want to lose any log messages yet have fairly small safepoints.

Best regards,
Johan

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8323807: Async UL: Add a stalling mode to async UL (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22770/head:pull/22770
$ git checkout pull/22770

Update a local copy of the PR:
$ git checkout pull/22770
$ git pull https://git.openjdk.org/jdk.git pull/22770/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22770

View PR using the GUI difftool:
$ git pr show -t 22770

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22770.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2024-12-16T17:12:16Z

👋 Welcome back jsjolen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-12-16T17:12:55Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2024-12-16T17:13:40Z

@jdksjolen The following label will be automatically applied to this pull request:

hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2024-12-16T17:16:48Z

Webrevs

00: Full (f848476b)

Implement stalling mode for ping-pong buffers

f848476

openjdk bot changed the title ~~8323807~~ 8323807: Async UL: Add a stalling mode to async UL Dec 16, 2024

openjdk bot added the rfr Pull request is ready for review label Dec 16, 2024

openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8323807: Async UL: Add a stalling mode to async UL #22770

8323807: Async UL: Add a stalling mode to async UL #22770

jdksjolen commented Dec 16, 2024 •

edited by openjdk bot

Loading

bridgekeeper bot commented Dec 16, 2024

openjdk bot commented Dec 16, 2024

openjdk bot commented Dec 16, 2024

mlbridge bot commented Dec 16, 2024

8323807: Async UL: Add a stalling mode to async UL #22770

Are you sure you want to change the base?

8323807: Async UL: Add a stalling mode to async UL #22770

Conversation

jdksjolen commented Dec 16, 2024 • edited by openjdk bot Loading

Some benchmarks

Stalling

Dropping

Sync

Progress

Issue

Reviewing

bridgekeeper bot commented Dec 16, 2024

openjdk bot commented Dec 16, 2024

openjdk bot commented Dec 16, 2024

mlbridge bot commented Dec 16, 2024

Webrevs

jdksjolen commented Dec 16, 2024 •

edited by openjdk bot

Loading