-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Optimize RandomGenerator::nextBytes #14638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back Glavo! A progress list of the required criteria for merging this PR into |
|
Here are the results for the This PR significantly improves performance for the default implementation in For the |
|
The only confirmed performance degradation (<5%) is when the byte array is empty. For byte arrays with a length greater than 4 (or 8 for |
|
You should probably update the 2 existing benchmarks in https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/RandomNext.java and https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java, or include your benchmark there. In intellij idea, you can add the micro directory as a module and add jmh maven library and the jdk modules as compile-time dependencies, so intellij can help working on the benchmarks. |
| if (unsafe.isBigEndian()) | ||
| rnd = Long.reverseBytes(rnd); | ||
| unsafe.putLong(bytes, (long)Unsafe.ARRAY_BYTE_BASE_OFFSET + i, rnd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (unsafe.isBigEndian()) | |
| rnd = Long.reverseBytes(rnd); | |
| unsafe.putLong(bytes, (long)Unsafe.ARRAY_BYTE_BASE_OFFSET + i, rnd); | |
| unsafe.putLong(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + i, nextLong(), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liach putLong doesn't seem to have such an overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see if putLongUnaligned, whose java code tries to put aligned long if possible, works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see if
putLongUnaligned, whose java code tries to put aligned long if possible, works.
The reason I didn't use it was concerned about generating slower code on platforms that don't support unaligned accesses, I don't know if C2 understands that this is always an aligned access.
| int rnd = nextInt(); | ||
| if (unsafe.isBigEndian()) | ||
| rnd = Integer.reverseBytes(rnd); | ||
| unsafe.putInt(bytes, (long)Unsafe.ARRAY_BYTE_BASE_OFFSET + i, rnd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| int rnd = nextInt(); | |
| if (unsafe.isBigEndian()) | |
| rnd = Integer.reverseBytes(rnd); | |
| unsafe.putInt(bytes, (long)Unsafe.ARRAY_BYTE_BASE_OFFSET + i, rnd); | |
| unsafe.putInt(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + i, nextInt(), false); |
|
Also, should we use ByteArrayLittleEndian instead of Unsafe, once ByteArrayLittleEndian is no longer dependent on VarHandle? |
Due to the overhead of using non aligned reads and checking indexes, I am running benchmarks based on |
Well, it seems that this is not always correct. For L32X64MixRandom, when the bytes length is greater than 1024, using |
|
I think |
|
Benchmarking results based on current Results``` Benchmark (length) Mode Cnt Score Error Units RandomBenchmark.L32X64MixRandom 0 thrpt 5 1519005.337 ± 10166.724 ops/ms RandomBenchmark.L32X64MixRandom 1 thrpt 5 215438.181 ± 1296.270 ops/ms RandomBenchmark.L32X64MixRandom 2 thrpt 5 203155.966 ± 1102.743 ops/ms RandomBenchmark.L32X64MixRandom 3 thrpt 5 190993.049 ± 1583.488 ops/ms RandomBenchmark.L32X64MixRandom 4 thrpt 5 184699.083 ± 1656.026 ops/ms RandomBenchmark.L32X64MixRandom 5 thrpt 5 164362.211 ± 1688.353 ops/ms RandomBenchmark.L32X64MixRandom 6 thrpt 5 156946.704 ± 1188.623 ops/ms RandomBenchmark.L32X64MixRandom 7 thrpt 5 153627.148 ± 2754.413 ops/ms RandomBenchmark.L32X64MixRandom 8 thrpt 5 164011.508 ± 87.110 ops/ms RandomBenchmark.L32X64MixRandom 10 thrpt 5 101824.800 ± 183.479 ops/ms RandomBenchmark.L32X64MixRandom 12 thrpt 5 98005.608 ± 188.852 ops/ms RandomBenchmark.L32X64MixRandom 14 thrpt 5 95530.799 ± 109.554 ops/ms RandomBenchmark.L32X64MixRandom 16 thrpt 5 114617.995 ± 51.252 ops/ms RandomBenchmark.L32X64MixRandom 32 thrpt 5 54787.870 ± 36.547 ops/ms RandomBenchmark.L32X64MixRandom 64 thrpt 5 29267.303 ± 17.143 ops/ms RandomBenchmark.L32X64MixRandom 128 thrpt 5 15590.939 ± 5.373 ops/ms RandomBenchmark.L32X64MixRandom 256 thrpt 5 8001.160 ± 2.425 ops/ms RandomBenchmark.L32X64MixRandom 512 thrpt 5 4035.970 ± 1.097 ops/ms RandomBenchmark.L32X64MixRandom 1024 thrpt 5 2390.227 ± 0.317 ops/ms RandomBenchmark.L32X64MixRandom 2048 thrpt 5 1210.989 ± 0.190 ops/ms RandomBenchmark.L32X64MixRandom 4096 thrpt 5 609.188 ± 0.051 ops/ms RandomBenchmark.L32X64MixRandom 8192 thrpt 5 302.962 ± 1.783 ops/ms RandomBenchmark.Random 0 thrpt 5 1511686.595 ± 64669.779 ops/ms RandomBenchmark.Random 1 thrpt 5 355958.380 ± 33275.649 ops/ms RandomBenchmark.Random 2 thrpt 5 322566.151 ± 2151.769 ops/ms RandomBenchmark.Random 3 thrpt 5 291901.421 ± 3873.578 ops/ms RandomBenchmark.Random 4 thrpt 5 270129.002 ± 19.117 ops/ms RandomBenchmark.Random 5 thrpt 5 135856.891 ± 566.745 ops/ms RandomBenchmark.Random 6 thrpt 5 130272.051 ± 61.738 ops/ms RandomBenchmark.Random 7 thrpt 5 123843.896 ± 107.200 ops/ms RandomBenchmark.Random 8 thrpt 5 159297.447 ± 77.475 ops/ms RandomBenchmark.Random 10 thrpt 5 97626.041 ± 420.827 ops/ms RandomBenchmark.Random 12 thrpt 5 104838.370 ± 52.721 ops/ms RandomBenchmark.Random 14 thrpt 5 75077.145 ± 142.321 ops/ms RandomBenchmark.Random 16 thrpt 5 78217.212 ± 17.730 ops/ms RandomBenchmark.Random 32 thrpt 5 39289.349 ± 5.522 ops/ms RandomBenchmark.Random 64 thrpt 5 19673.761 ± 18.463 ops/ms RandomBenchmark.Random 128 thrpt 5 9856.985 ± 1.844 ops/ms RandomBenchmark.Random 256 thrpt 5 4928.253 ± 0.684 ops/ms RandomBenchmark.Random 512 thrpt 5 2431.380 ± 1.006 ops/ms RandomBenchmark.Random 1024 thrpt 5 1239.599 ± 0.204 ops/ms RandomBenchmark.Random 2048 thrpt 5 618.926 ± 0.181 ops/ms RandomBenchmark.Random 4096 thrpt 5 272.700 ± 1.009 ops/ms RandomBenchmark.Random 8192 thrpt 5 151.693 ± 0.117 ops/ms ``` |
|
I looked into that a few months ago too but didn't come around to actually create a PR mainly for the following reasons (besides lack of time):
I think 1. should definitely be addressed. There is |
Personally, I often use it to generate some data for unit testing, so improving its performance would be helpful to me.
I agree. |
|
Looking into the baseline results: Auto-vectorization might be already at work for RandomGenerator. We need to prove the optimization offered by Unsafe putLong and VarHandle are reliable instead of some unreliable side effects of JIT, and that's why I am hesitant to create a JBS issue. |
Disassembly (baseline)(I'm not familiar with assembly) I guess loop unrolling is working? |
|
I'm not sure too, but there's vmovd and vmovq which are moving double or quad words at once, so it appears vectorized. But using Unsafe/ByteArrayLittleEndian explicitly still seems better optimized from your results; I guess it might be because that you know the input random number size (int/long sizes). Can you try how putLongUnaligned etc. work, as VH implementation delegates to the unaligned versions (for plain get/set)? |
|
Use ResultsUse Results``` Benchmark (length) Mode Cnt Score Error Units RandomBenchmark.L32X64MixRandom 0 thrpt 5 1528297.256 ± 11983.204 ops/ms RandomBenchmark.L32X64MixRandom 1 thrpt 5 215656.684 ± 1794.981 ops/ms RandomBenchmark.L32X64MixRandom 2 thrpt 5 201420.705 ± 1377.903 ops/ms RandomBenchmark.L32X64MixRandom 3 thrpt 5 190722.759 ± 3562.388 ops/ms RandomBenchmark.L32X64MixRandom 4 thrpt 5 184578.897 ± 587.992 ops/ms RandomBenchmark.L32X64MixRandom 5 thrpt 5 164248.972 ± 1153.358 ops/ms RandomBenchmark.L32X64MixRandom 6 thrpt 5 145869.045 ± 1342.215 ops/ms RandomBenchmark.L32X64MixRandom 7 thrpt 5 153291.149 ± 4666.694 ops/ms RandomBenchmark.L32X64MixRandom 8 thrpt 5 163664.923 ± 559.088 ops/ms RandomBenchmark.L32X64MixRandom 10 thrpt 5 101878.885 ± 322.857 ops/ms RandomBenchmark.L32X64MixRandom 12 thrpt 5 98918.245 ± 305.201 ops/ms RandomBenchmark.L32X64MixRandom 14 thrpt 5 95554.296 ± 253.037 ops/ms RandomBenchmark.L32X64MixRandom 16 thrpt 5 114686.083 ± 10.662 ops/ms RandomBenchmark.L32X64MixRandom 32 thrpt 5 54694.191 ± 77.666 ops/ms RandomBenchmark.L32X64MixRandom 64 thrpt 5 29272.233 ± 13.130 ops/ms RandomBenchmark.L32X64MixRandom 128 thrpt 5 15423.642 ± 13.856 ops/ms RandomBenchmark.L32X64MixRandom 256 thrpt 5 8007.269 ± 6.237 ops/ms RandomBenchmark.L32X64MixRandom 512 thrpt 5 4035.672 ± 1.192 ops/ms RandomBenchmark.L32X64MixRandom 1024 thrpt 5 2389.270 ± 1.732 ops/ms RandomBenchmark.L32X64MixRandom 2048 thrpt 5 1210.966 ± 0.645 ops/ms RandomBenchmark.L32X64MixRandom 4096 thrpt 5 609.226 ± 0.026 ops/ms RandomBenchmark.L32X64MixRandom 8192 thrpt 5 305.380 ± 0.147 ops/ms RandomBenchmark.Random 0 thrpt 5 1519068.332 ± 17554.468 ops/ms RandomBenchmark.Random 1 thrpt 5 349320.420 ± 50935.172 ops/ms RandomBenchmark.Random 2 thrpt 5 325239.890 ± 1852.854 ops/ms RandomBenchmark.Random 3 thrpt 5 293215.822 ± 5502.425 ops/ms RandomBenchmark.Random 4 thrpt 5 270030.002 ± 635.288 ops/ms RandomBenchmark.Random 5 thrpt 5 135824.338 ± 1411.090 ops/ms RandomBenchmark.Random 6 thrpt 5 131045.378 ± 131.826 ops/ms RandomBenchmark.Random 7 thrpt 5 123870.748 ± 281.168 ops/ms RandomBenchmark.Random 8 thrpt 5 159068.553 ± 577.367 ops/ms RandomBenchmark.Random 10 thrpt 5 97813.949 ± 133.771 ops/ms RandomBenchmark.Random 12 thrpt 5 104909.089 ± 54.468 ops/ms RandomBenchmark.Random 14 thrpt 5 75004.214 ± 237.386 ops/ms RandomBenchmark.Random 16 thrpt 5 78205.257 ± 91.166 ops/ms RandomBenchmark.Random 32 thrpt 5 39289.218 ± 24.475 ops/ms RandomBenchmark.Random 64 thrpt 5 19676.129 ± 8.671 ops/ms RandomBenchmark.Random 128 thrpt 5 9856.330 ± 1.669 ops/ms RandomBenchmark.Random 256 thrpt 5 4928.997 ± 1.652 ops/ms RandomBenchmark.Random 512 thrpt 5 2429.244 ± 2.227 ops/ms RandomBenchmark.Random 1024 thrpt 5 1239.338 ± 0.306 ops/ms RandomBenchmark.Random 2048 thrpt 5 619.758 ± 0.055 ops/ms RandomBenchmark.Random 4096 thrpt 5 274.033 ± 0.714 ops/ms RandomBenchmark.Random 8192 thrpt 5 151.607 ± 0.013 ops/ms ```The result seems interesting. |
|
The new implementation of Interestingly, |
|
Can you publish your put_Unaligned code and the one with updated ByteArrayLittleEndian in two branches in your fork? I doubt something might be off in your code, and wish to test out on my end. |
Use Use This is my test server: I need to spend the day updating my server and upgrading some accessories tomorrow. If you are unable to replicate my previous JMH results, I will rerun all tests after upgrading the server. |
I updated
I just did a quick search for For example, in jdk/test/jdk/java/util/zip/ZipFile/ZipEntryFreeTest.java Lines 77 to 87 in 0db63ec
It is widely used in unit testing to generate random test data. Optimizing it can help developers reduce the time spent running tests. |
|
I choose to use The latest test results have been updated in the first comment of this PR: #14638 (comment) |
@liach I understand, the reason is that I'm calling Since it is an interface method and there is no way to simply create a private field, I chose to use |
|
I wonder how much the work from #16245 would cover here. If C2 can improve such situations, it might be a better solution. |
|
Is there value in also implementing nextBytes explicitly in ThreadLocalRandom to be based on nextLong (rather than nextInt as inherited from Random)? |
|
@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
|
/open |
|
@Glavo This pull request is now open |
|
Is there anyone to review this PR? |
|
Will this be obsolete with #16245? If not I can create a JBS issue so that this PR will get sent to the mailing list. |
|
❗ This change is not yet ready to be integrated. |
|
@liach as I read that pr, putting an int or long to a byte[] is still faster with Unsafe and ByteArrayLittleEndian (though much less difference than before). |
|
@Glavo were you going to update ThreadLocalRandom also (as mentioned here #14638 (comment))? |
|
@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
|
/open |
|
@Glavo This pull request is now open |
|
@Glavo this pull request can not be integrated into git checkout random
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
|
@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
Progress
Warning
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14638/head:pull/14638$ git checkout pull/14638Update a local copy of the PR:
$ git checkout pull/14638$ git pull https://git.openjdk.org/jdk.git pull/14638/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14638View PR using the GUI difftool:
$ git pr show -t 14638Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14638.diff