Skip to content

Conversation

@wenshao
Copy link
Contributor

@wenshao wenshao commented Jul 1, 2023

PR 14578 still has unresolved discussions, continue to make improvements.

Benchmark Result

sh make/devkit/createJMHBundle.sh
bash configure --with-jmh=build/jmh/jars
make test TEST="micro:java.util.UUIDBench.toString"

1. aliyun_ecs_c8i.xlarge

  • cpu : intel xeon sapphire rapids (x64)
-Benchmark           (size)   Mode  Cnt   Score   Error   Units (baseline)
-UUIDBench.toString   20000  thrpt   15  62.019 ± 0.622  ops/us

+Benchmark           (size)   Mode  Cnt   Score   Error   Units
+UUIDBench.toString   20000  thrpt   15  82.998 ± 0.739  ops/us (+33.82%)

2. aliyun_ecs_c8a.xlarge

  • cpu : amd epc genoa (x64)
-Benchmark           (size)   Mode  Cnt   Score   Error   Units (baseline)
-UUIDBench.toString   20000  thrpt   15  88.668 ± 0.672  ops/us

+Benchmark           (size)   Mode  Cnt   Score   Error   Units
+UUIDBench.toString   20000  thrpt   15  89.229 ± 0.271  ops/us (+0.63%)

3. aliyun_ecs_c8y.xlarge

  • cpu : aliyun yitian 710 (aarch64)
-Benchmark           (size)   Mode  Cnt   Score   Error   Units (baseline)
-UUIDBench.toString   20000  thrpt   15  49.382 ± 2.160  ops/us

+Benchmark           (size)   Mode  Cnt   Score   Error   Units
+UUIDBench.toString   20000  thrpt   15  49.636 ± 1.974  ops/us (+0.51%)

4. MacBookPro M1 Pro

-Benchmark           (size)   Mode  Cnt    Score   Error   Units (baseline)
-UUIDBench.toString   20000  thrpt   15  103.543 ± 0.963  ops/us

+Benchmark           (size)   Mode  Cnt    Score   Error   Units
+UUIDBench.toString   20000  thrpt   15  110.976 ± 0.685  ops/us (+7.17%)

5. Orange Pi 5 Plus

-Benchmark           (size)   Mode  Cnt   Score   Error   Units (baseline)
-UUIDBench.toString   20000  thrpt   15  33.532 ± 0.396  ops/us

+Benchmark           (size)   Mode  Cnt   Score   Error   Units (PR)
+UUIDBench.toString   20000  thrpt   15  33.054 ± 0.190  ops/us (-4.42%)

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8311207: Cleanup for Optimization for UUID.toString (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14745/head:pull/14745
$ git checkout pull/14745

Update a local copy of the PR:
$ git checkout pull/14745
$ git pull https://git.openjdk.org/jdk.git pull/14745/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14745

View PR using the GUI difftool:
$ git pr show -t 14745

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14745.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 1, 2023

👋 Welcome back wenshao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 1, 2023
@openjdk
Copy link

openjdk bot commented Jul 1, 2023

@wenshao The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Jul 1, 2023
@mlbridge
Copy link

mlbridge bot commented Jul 1, 2023

@liach
Copy link
Member

liach commented Jul 1, 2023

Is using Unsafe directly consistently faster than using ByteArray? It should have similar performance as ByteArray's VarHandle is simply a wrapper around Unsafe's put/get methods.

@wenshao
Copy link
Contributor Author

wenshao commented Jul 1, 2023

Is using Unsafe directly consistently faster than using ByteArray? It should have similar performance as ByteArray's VarHandle is simply a wrapper around Unsafe's put/get methods.

Using Unsafe on aliyun_ecs_c8i.xlarge and MacBookPro M1 Pro is faster than ByteArray, and I haven't figured out why

@liach
Copy link
Member

liach commented Jul 1, 2023

Is using Unsafe directly consistently faster than using ByteArray? It should have similar performance as ByteArray's VarHandle is simply a wrapper around Unsafe's put/get methods.

Using Unsafe on aliyun_ecs_c8i.xlarge and MacBookPro M1 Pro is faster than ByteArray, and I haven't figured out why

Then it's probably VarHandle's overhead. No worries; your change to use Unsafe is totally fine.

Meanwhile, can you enable GitHub actions on your fork, so it can detect compile and test errors? Like this:
image

@wenshao
Copy link
Contributor Author

wenshao commented Jul 1, 2023

image

it's enabled

@liach
Copy link
Member

liach commented Jul 1, 2023

@wenshao I have made my suggestions into a patch for you: wenshao#1
Feel free to review.

@openjdk
Copy link

openjdk bot commented Jul 1, 2023

⚠️ @wenshao This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@liach
Copy link
Member

liach commented Jul 1, 2023

@wenshao I have made my suggestions into a patch for you: wenshao#1 Feel free to review.

your version performance is a bit worse, if i can't find the reason, I will restore the previous version

Can you post the benchmark results? And do you have like a baseline for the benchmarks, as there may be other factors that affect performance from run to run?

Also, my fault for actions: can you go to your actions tab and enable actions for your fork like this?
image

@wenshao
Copy link
Contributor Author

wenshao commented Jul 1, 2023

@liach Your version performance is a bit worse, If I can't find the reason, I will revert to the previous version

@wenshao
Copy link
Contributor Author

wenshao commented Jul 1, 2023

@liach

MacBookPro M1 Pro

Benchmark           (size)   Mode  Cnt    Score   Error   Units 
UUIDBench.toString   20000  thrpt   15  104.262 ± 2.199  ops/us
Benchmark           (size)   Mode  Cnt   Score   Error   Units
UUIDBench.toString   20000  thrpt   15  81.622 ± 0.194  ops/us

@wenshao
Copy link
Contributor Author

wenshao commented Jul 1, 2023

@liach I guess your version is slower because it doesn't support out-of-order execution.

wenshao and others added 3 commits July 1, 2023 17:41
Co-authored-by: liach <liach@users.noreply.github.com>
@wenshao wenshao changed the title 8311207: Optimization for j.u.UUID.toString 8311207: Cleanup for Optimization for UUID.toString Jul 1, 2023
@wenshao
Copy link
Contributor Author

wenshao commented Jul 1, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Jul 1, 2023

@wenshao This pull request has not yet been marked as ready for integration.

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 8, 2023
@openjdk
Copy link

openjdk bot commented Sep 8, 2023

@wenshao
Your change (at version 46b4b05) is now ready to be sponsored by a Committer.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 10, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 10, 2023

@wenshao Only Committers are allowed to sponsor changes.

@RogerRiggs
Copy link
Contributor

RogerRiggs commented Sep 10, 2023

Given the endian-ness issues with https://git.openjdk.org/jdk/pull/14699.
I'll need to run a more complete set of tests first (before its sponsored).

@wenshao
Copy link
Contributor Author

wenshao commented Sep 11, 2023

@TheRealMDoerr Can you help me test this PR on AIX (big endian) ?

@TheRealMDoerr
Copy link
Contributor

I have run a couple of tests on linux Big Endian. They have passed. So, it's probably correct. However, I can't tell if it's good to use ByteArrayLittleEndian. I don't really like such platform details in the Java classes. Is that necessary for better performance on x86?

@wenshao
Copy link
Contributor Author

wenshao commented Sep 11, 2023

@RogerRiggs Can it be merged now?

@liach
Copy link
Member

liach commented Sep 11, 2023

@TheRealMDoerr ByteArrayLittleEndian only means that the input long/int/short/char will be seen as little-endian when written to a byte array; do you mean that assuming little-endian writes are faster is too platform-specific?

An alternative approach tried before is to pack the digits platform-specifically and use Unsafe (which bypasses platform-endianness reversals) to write directly; I recall it was rejected before, for using unsafe directly seems... unsafe :)

@TheRealMDoerr
Copy link
Contributor

I think making sure C2 optimizes it would be a better approach. Java classes shouldn't be optimized for performance on any endianness version IMHO. Rather for readability.
@offamitkumar, @deepa181, @JoKern65, @TOatGithub: You may want to check performance impact on s390x and AIX.

@offamitkumar
Copy link
Member

@offamitkumar, @deepa181, @JoKern65, @TOatGithub: You may want to check performance impact on s390x and AIX.

@TheRealMDoerr Testing on s390 is not possible for now, as build is broken due to field resolution changes.

# Conflicts:
#	src/java.base/share/classes/java/util/UUID.java
#	src/java.base/share/classes/jdk/internal/util/HexDigits.java
@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Sep 12, 2023
Comment on lines +109 to +112
return DIGITS[b0 & 0xff]
| (DIGITS[b1 & 0xff] << 16)
| (((long) DIGITS[b2 & 0xff]) << 32)
| (((long) DIGITS[b3 & 0xff]) << 48);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reverse the order of these source lines to put the shifts of the higher order bits before the lower order bit shifts. 3333222211110000. Its easier to understand where the bits end up in the long.
The rest of the change is better focused.

Copy link
Contributor Author

@wenshao wenshao Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if reverse packDigits order, performance will be slow, I don't know why yet.

The following is the data running on MacBookPro M1 Max :

make test TEST="micro:java.util.UUIDBench.toString"

Benchmark           (size)   Mode  Cnt   Score   Error   Units (current order 4f6ed3e6)
UUIDBench.toString   20000  thrpt   15  96.396 ? 0.946  ops/us


Benchmark           (size)   Mode  Cnt   Score   Error   Units (reverse packDigits order)
UUIDBench.toString   20000  thrpt   15  86.496 ? 0.542  ops/us

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like something that might be an interesting puzzler for JIT compiler folks. Perhaps added implicit casts to long messes something up?

@wenshao
Copy link
Contributor Author

wenshao commented Sep 13, 2023

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 13, 2023
@openjdk
Copy link

openjdk bot commented Sep 13, 2023

@wenshao
Your change (at version 4f6ed3e) is now ready to be sponsored by a Committer.

@cl4es
Copy link
Member

cl4es commented Sep 13, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 13, 2023

Going to push as commit f8df754.
Since your change was applied there have been 16 commits pushed to the master branch:

  • fecd2fd: 8315898: Open source swing JMenu tests
  • bb6b3f2: 8315761: Open source few swing JList and JMenuBar tests
  • 2d168c5: 8313202: MutexLocker should disallow null Mutexes
  • 36552e7: 8316123: ProblemList serviceability/dcmd/gc/RunFinalizationTest.java on AIX
  • fe5ef5f: 8315677: Open source few swing JFileChooser and other tests
  • ece9bdf: 8299614: Shenandoah: STW mark should keep nmethod/oops referenced from stack chunk alive
  • a36f5a5: 8315663: Open source misc awt tests
  • cbbfa0d: 8315652: RISC-V: Features string uses wrong separator for jtreg
  • 1ebf510: 8315743: RISC-V: Cleanup masm lr()/sc() methods
  • bd52bbf: 8316060: test/hotspot/jtreg/runtime/reflect/ReflectOutOfMemoryError.java may fail if heap is huge
  • ... and 6 more: https://git.openjdk.org/jdk/compare/e0845163aa57cc8f68b11e1a553885676358f2a6...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 13, 2023
@openjdk openjdk bot closed this Sep 13, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Sep 13, 2023
@openjdk
Copy link

openjdk bot commented Sep 13, 2023

@cl4es @wenshao Pushed as commit f8df754.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

7 participants