Skip to content

Conversation

@Glavo
Copy link
Contributor

@Glavo Glavo commented Jun 24, 2023


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Warning

 ⚠️ Patch contains a binary file (test/jdk/javax/swing/AbstractButton/5049549/SE1.gif)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14638/head:pull/14638
$ git checkout pull/14638

Update a local copy of the PR:
$ git checkout pull/14638
$ git pull https://git.openjdk.org/jdk.git pull/14638/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14638

View PR using the GUI difftool:
$ git pr show -t 14638

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14638.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 24, 2023

👋 Welcome back Glavo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 24, 2023

@Glavo The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Jun 24, 2023
@Glavo
Copy link
Contributor Author

Glavo commented Jun 24, 2023

Here are the results for the RandomGeneratorNextBytes benchmark (Here will be continuously updated to show the latest results):

                                                                                          (Baseline)                              (This PR)
Benchmark                                           (algo)  (length)   Mode  Cnt       Score       Error   Units            Score       Error   Units
RandomGeneratorNextBytes.testNextBytes              Random         1  thrpt    5  292124.677 ±  6377.255  ops/ms       346221.250 ± 86860.488  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random         2  thrpt    5  261235.014 ± 15707.040  ops/ms       323470.739 ± 16084.063  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random         4  thrpt    5  240194.023 ±  4417.534  ops/ms       286154.793 ±  2162.091  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random         8  thrpt    5  120707.831 ±  5701.440  ops/ms       156008.005 ±   128.043  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random        16  thrpt    5   63594.497 ±   438.139  ops/ms        78236.080 ±    15.013  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random        32  thrpt    5   35420.287 ±   427.508  ops/ms        39262.435 ±    18.943  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random        64  thrpt    5   17651.831 ±    25.639  ops/ms        19688.311 ±    19.507  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random       128  thrpt    5    8554.908 ±    19.695  ops/ms         9887.630 ±     6.683  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random       256  thrpt    5    4560.283 ±    27.455  ops/ms         4874.348 ±     3.856  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random      1024  thrpt    5    1161.771 ±     2.053  ops/ms         1242.620 ±     0.311  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random      4096  thrpt    5     294.610 ±     0.764  ops/ms          309.557 ±     0.131  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random     16384  thrpt    5      73.885 ±     0.055  ops/ms           77.973 ±     0.038  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom         1  thrpt    5  214239.266 ±  1103.018  ops/ms       215641.075 ±  1901.826  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom         2  thrpt    5  199700.840 ±   465.203  ops/ms       201313.181 ±  1069.213  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom         4  thrpt    5  184605.447 ±  1057.641  ops/ms       184081.550 ±  1068.982  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom         8  thrpt    5  144195.042 ±  2155.839  ops/ms       166970.270 ±    62.509  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom        16  thrpt    5   92010.333 ±   272.006  ops/ms        90731.669 ±  1179.712  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom        32  thrpt    5   45378.019 ±   487.964  ops/ms        54470.769 ±   789.986  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom        64  thrpt    5   24958.803 ±    57.066  ops/ms        29271.323 ±    62.528  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom       128  thrpt    5   12967.609 ±    30.151  ops/ms        15460.181 ±    50.493  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom       256  thrpt    5    6620.502 ±     8.294  ops/ms         7974.591 ±    20.440  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom      1024  thrpt    5    1670.174 ±    14.304  ops/ms         2391.758 ±     1.891  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom      4096  thrpt    5     415.035 ±     0.771  ops/ms          609.107 ±     0.279  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom     16384  thrpt    5     103.704 ±     0.013  ops/ms          152.771 ±     0.270  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus         1  thrpt    5  378919.462 ± 20733.749  ops/ms       382509.180 ±   418.348  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus         2  thrpt    5  352209.019 ±   340.381  ops/ms       346027.427 ±  2979.327  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus         4  thrpt    5  327951.428 ±   172.418  ops/ms       327855.763 ±   280.082  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus         8  thrpt    5  269875.472 ±    48.783  ops/ms       229580.541 ±    24.469  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus        16  thrpt    5  157786.908 ±   363.565  ops/ms       183664.801 ±    19.788  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus        32  thrpt    5   85927.731 ±  1988.607  ops/ms       135010.073 ±    12.742  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus        64  thrpt    5   45121.367 ±   113.888  ops/ms        90891.031 ±    51.981  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus       128  thrpt    5   23266.361 ±    83.143  ops/ms        52998.113 ±   527.246  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus       256  thrpt    5   10845.534 ±    23.174  ops/ms        29423.939 ±    10.840  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus      1024  thrpt    5    2724.955 ±     1.782  ops/ms         7910.042 ±   175.002  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus      4096  thrpt    5     744.280 ±     0.064  ops/ms         2064.625 ±     0.646  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus     16384  thrpt    5     186.613 ±     0.012  ops/ms          573.580 ±     5.850  ops/ms

This PR significantly improves performance for the default implementation in RandomGenerator.

For the Xoshiro256** algorithm, when the target array is large, the performance of this PR is 3.07 times that of the original.

@Glavo
Copy link
Contributor Author

Glavo commented Jun 24, 2023

The only confirmed performance degradation (<5%) is when the byte array is empty.

For byte arrays with a length greater than 4 (or 8 for RandomGenerator), we often see a performance improvement of 10% to 30%.

@liach
Copy link
Member

liach commented Jun 25, 2023

You should probably update the 2 existing benchmarks in https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/RandomNext.java and https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java, or include your benchmark there.

In intellij idea, you can add the micro directory as a module and add jmh maven library and the jdk modules as compile-time dependencies, so intellij can help working on the benchmarks.

Comment on lines 489 to 491
if (unsafe.isBigEndian())
rnd = Long.reverseBytes(rnd);
unsafe.putLong(bytes, (long)Unsafe.ARRAY_BYTE_BASE_OFFSET + i, rnd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (unsafe.isBigEndian())
rnd = Long.reverseBytes(rnd);
unsafe.putLong(bytes, (long)Unsafe.ARRAY_BYTE_BASE_OFFSET + i, rnd);
unsafe.putLong(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + i, nextLong(), false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liach putLong doesn't seem to have such an overload.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see if putLongUnaligned, whose java code tries to put aligned long if possible, works.

Copy link
Contributor Author

@Glavo Glavo Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see if putLongUnaligned, whose java code tries to put aligned long if possible, works.

The reason I didn't use it was concerned about generating slower code on platforms that don't support unaligned accesses, I don't know if C2 understands that this is always an aligned access.

Comment on lines 473 to 476
int rnd = nextInt();
if (unsafe.isBigEndian())
rnd = Integer.reverseBytes(rnd);
unsafe.putInt(bytes, (long)Unsafe.ARRAY_BYTE_BASE_OFFSET + i, rnd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int rnd = nextInt();
if (unsafe.isBigEndian())
rnd = Integer.reverseBytes(rnd);
unsafe.putInt(bytes, (long)Unsafe.ARRAY_BYTE_BASE_OFFSET + i, rnd);
unsafe.putInt(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + i, nextInt(), false);

@liach
Copy link
Member

liach commented Jun 25, 2023

Also, should we use ByteArrayLittleEndian instead of Unsafe, once ByteArrayLittleEndian is no longer dependent on VarHandle?

@Glavo
Copy link
Contributor Author

Glavo commented Jun 25, 2023

Also, should we use ByteArrayLittleEndian instead of Unsafe, once ByteArrayLittleEndian is no longer dependent on VarHandle?

Due to the overhead of using non aligned reads and checking indexes, ByteArrayLittleEndian is slower than directly calling getLong.

I am running benchmarks based on ByteArrayLittleEndian. The currently benchmark result is that Unsafe.getLong is 13.76% faster than ByteArrayLittleEndian for L32X64MixRandom (bytes.length = 8).

@Glavo
Copy link
Contributor Author

Glavo commented Jun 25, 2023

Due to the overhead of using non aligned reads and checking indexes, ByteArrayLittleEndian is slower than directly calling getLong.

Well, it seems that this is not always correct.

For L32X64MixRandom, when the bytes length is greater than 1024, using ByteArrayLittleEndian is actually 10% faster than getLong. I don't understand why this is happening.

@liach
Copy link
Member

liach commented Jun 25, 2023

I think putLongUnaligned tries to put aligned if it can, don't know how C1 or C2 handles it. I think it's a win as long as either Unsafe or VarHandle is faster than the existing manual loop (which could have already been vectorized by C2)

@Glavo
Copy link
Contributor Author

Glavo commented Jun 25, 2023

Benchmarking results based on current ByteArrayLittleEndian(VarHandle):

Results ``` Benchmark (length) Mode Cnt Score Error Units RandomBenchmark.L32X64MixRandom 0 thrpt 5 1519005.337 ± 10166.724 ops/ms RandomBenchmark.L32X64MixRandom 1 thrpt 5 215438.181 ± 1296.270 ops/ms RandomBenchmark.L32X64MixRandom 2 thrpt 5 203155.966 ± 1102.743 ops/ms RandomBenchmark.L32X64MixRandom 3 thrpt 5 190993.049 ± 1583.488 ops/ms RandomBenchmark.L32X64MixRandom 4 thrpt 5 184699.083 ± 1656.026 ops/ms RandomBenchmark.L32X64MixRandom 5 thrpt 5 164362.211 ± 1688.353 ops/ms RandomBenchmark.L32X64MixRandom 6 thrpt 5 156946.704 ± 1188.623 ops/ms RandomBenchmark.L32X64MixRandom 7 thrpt 5 153627.148 ± 2754.413 ops/ms RandomBenchmark.L32X64MixRandom 8 thrpt 5 164011.508 ± 87.110 ops/ms RandomBenchmark.L32X64MixRandom 10 thrpt 5 101824.800 ± 183.479 ops/ms RandomBenchmark.L32X64MixRandom 12 thrpt 5 98005.608 ± 188.852 ops/ms RandomBenchmark.L32X64MixRandom 14 thrpt 5 95530.799 ± 109.554 ops/ms RandomBenchmark.L32X64MixRandom 16 thrpt 5 114617.995 ± 51.252 ops/ms RandomBenchmark.L32X64MixRandom 32 thrpt 5 54787.870 ± 36.547 ops/ms RandomBenchmark.L32X64MixRandom 64 thrpt 5 29267.303 ± 17.143 ops/ms RandomBenchmark.L32X64MixRandom 128 thrpt 5 15590.939 ± 5.373 ops/ms RandomBenchmark.L32X64MixRandom 256 thrpt 5 8001.160 ± 2.425 ops/ms RandomBenchmark.L32X64MixRandom 512 thrpt 5 4035.970 ± 1.097 ops/ms RandomBenchmark.L32X64MixRandom 1024 thrpt 5 2390.227 ± 0.317 ops/ms RandomBenchmark.L32X64MixRandom 2048 thrpt 5 1210.989 ± 0.190 ops/ms RandomBenchmark.L32X64MixRandom 4096 thrpt 5 609.188 ± 0.051 ops/ms RandomBenchmark.L32X64MixRandom 8192 thrpt 5 302.962 ± 1.783 ops/ms RandomBenchmark.Random 0 thrpt 5 1511686.595 ± 64669.779 ops/ms RandomBenchmark.Random 1 thrpt 5 355958.380 ± 33275.649 ops/ms RandomBenchmark.Random 2 thrpt 5 322566.151 ± 2151.769 ops/ms RandomBenchmark.Random 3 thrpt 5 291901.421 ± 3873.578 ops/ms RandomBenchmark.Random 4 thrpt 5 270129.002 ± 19.117 ops/ms RandomBenchmark.Random 5 thrpt 5 135856.891 ± 566.745 ops/ms RandomBenchmark.Random 6 thrpt 5 130272.051 ± 61.738 ops/ms RandomBenchmark.Random 7 thrpt 5 123843.896 ± 107.200 ops/ms RandomBenchmark.Random 8 thrpt 5 159297.447 ± 77.475 ops/ms RandomBenchmark.Random 10 thrpt 5 97626.041 ± 420.827 ops/ms RandomBenchmark.Random 12 thrpt 5 104838.370 ± 52.721 ops/ms RandomBenchmark.Random 14 thrpt 5 75077.145 ± 142.321 ops/ms RandomBenchmark.Random 16 thrpt 5 78217.212 ± 17.730 ops/ms RandomBenchmark.Random 32 thrpt 5 39289.349 ± 5.522 ops/ms RandomBenchmark.Random 64 thrpt 5 19673.761 ± 18.463 ops/ms RandomBenchmark.Random 128 thrpt 5 9856.985 ± 1.844 ops/ms RandomBenchmark.Random 256 thrpt 5 4928.253 ± 0.684 ops/ms RandomBenchmark.Random 512 thrpt 5 2431.380 ± 1.006 ops/ms RandomBenchmark.Random 1024 thrpt 5 1239.599 ± 0.204 ops/ms RandomBenchmark.Random 2048 thrpt 5 618.926 ± 0.181 ops/ms RandomBenchmark.Random 4096 thrpt 5 272.700 ± 1.009 ops/ms RandomBenchmark.Random 8192 thrpt 5 151.693 ± 0.117 ops/ms ```

@SirYwell
Copy link
Member

I looked into that a few months ago too but didn't come around to actually create a PR mainly for the following reasons (besides lack of time):

  1. I didn't find any proper tests that ensure that the behavior described in the Javadocs is actually maintained
  2. I searched through usages of the nexBytes method on GitHub and mostly found a) usages of SecureRandom#nextBytes, which aren't affected by this, and b) usages with small arrays, where the effect isn't that huge.

I think 1. should definitely be addressed. There is java/util/Random/NextBytes.java with a very basic test, but it only covers Random and I think a proper test should put the implementation note directly in code.

@Glavo
Copy link
Contributor Author

Glavo commented Jun 25, 2023

I searched through usages of the nexBytes method on GitHub and mostly found a) usages of SecureRandom#nextBytes, which aren't affected by this, and b) usages with small arrays, where the effect isn't that huge.

Personally, I often use it to generate some data for unit testing, so improving its performance would be helpful to me.

I think 1. should definitely be addressed. There is java/util/Random/NextBytes.java with a very basic test, but it only covers Random and I think a proper test should put the implementation note directly in code.

I agree.

@liach
Copy link
Member

liach commented Jun 25, 2023

Looking into the baseline results:

RandomBenchmark.L32X64MixRandom        14  thrpt    5    88666.991 ±   247.778  ops/ms
RandomBenchmark.L32X64MixRandom        16  thrpt    5    94277.271 ±   661.097  ops/ms  <-- significantly higher than 14
RandomBenchmark.Random                  6  thrpt    5   121245.951 ±  1767.579  ops/ms
RandomBenchmark.Random                  7  thrpt    5   124512.260 ±  2239.107  ops/ms  <-- higher than 6
RandomBenchmark.Random                  8  thrpt    5   103982.515 ±  2052.329  ops/ms

Auto-vectorization might be already at work for RandomGenerator. We need to prove the optimization offered by Unsafe putLong and VarHandle are reliable instead of some unreliable side effects of JIT, and that's why I am hesitant to create a JBS issue.

@Glavo
Copy link
Contributor Author

Glavo commented Jun 25, 2023

Auto-vectorization might be already at work for RandomGenerator. We need to prove the optimization offered by Unsafe putLong and VarHandle are reliable instead of some unreliable side effects of JIT, and that's why I am hesitant to create a JBS issue.

Disassembly (baseline)
============================= C2-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------

Compiled method (c2)     155  693       4       java.util.random.RandomGenerator::nextBytes (100 bytes)
 total in heap  [0x00007f09c84c8c10,0x00007f09c84c9708] = 2808
 relocation     [0x00007f09c84c8d68,0x00007f09c84c8da0] = 56
 main code      [0x00007f09c84c8da0,0x00007f09c84c9570] = 2000
 stub code      [0x00007f09c84c9570,0x00007f09c84c9588] = 24
 oops           [0x00007f09c84c9588,0x00007f09c84c9590] = 8
 metadata       [0x00007f09c84c9590,0x00007f09c84c95a0] = 16
 scopes data    [0x00007f09c84c95a0,0x00007f09c84c9660] = 192
 scopes pcs     [0x00007f09c84c9660,0x00007f09c84c96f0] = 144
 dependencies   [0x00007f09c84c96f0,0x00007f09c84c96f8] = 8
 nul chk table  [0x00007f09c84c96f8,0x00007f09c84c9708] = 16

[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Entry Point]
  # {method} {0x000000080017fdf8} 'nextBytes' '([B)V' in 'java/util/random/RandomGenerator'
  # this:     rsi:rsi   = 'java/util/random/RandomGenerator'
  # parm0:    rdx:rdx   = '[B'
  #           [sp+0x40]  (sp of caller)
  0x00007f09c84c8da0:   mov    0x8(%rsi),%r10d
  0x00007f09c84c8da4:   movabs $0x800000000,%r11
  0x00007f09c84c8dae:   add    %r11,%r10
  0x00007f09c84c8db1:   cmp    %r10,%rax
  0x00007f09c84c8db4:   jne    0x00007f09c7da3d80           ;   {runtime_call ic_miss_stub}
  0x00007f09c84c8dba:   xchg   %ax,%ax
  0x00007f09c84c8dbc:   nopl   0x0(%rax)
[Verified Entry Point]
  0x00007f09c84c8dc0:   mov    %eax,-0x14000(%rsp)
  0x00007f09c84c8dc7:   push   %rbp
  0x00007f09c84c8dc8:   sub    $0x30,%rsp
  0x00007f09c84c8dcc:   cmpl   $0x1,0x20(%r15)
  0x00007f09c84c8dd4:   jne    0x00007f09c84c9562
  0x00007f09c84c8dda:   vmovq  %rsi,%xmm0
  0x00007f09c84c8ddf:   mov    0xc(%rdx),%r9d               ; implicit exception: dispatches to 0x00007f09c84c9538
  0x00007f09c84c8de3:   mov    %r9d,%ebp
  0x00007f09c84c8de6:   sar    $0x3,%ebp
  0x00007f09c84c8de9:   xor    %edi,%edi
  0x00007f09c84c8deb:   test   %ebp,%ebp
  0x00007f09c84c8ded:   jle    0x00007f09c84c945a
  0x00007f09c84c8df3:   mov    0x8(%rsi),%r10d
  0x00007f09c84c8df7:   lea    -0x1(%rbp),%r8d
  0x00007f09c84c8dfb:   cmp    $0x1021990,%r10d             ;   {metadata('jdk/random/L32X64MixRandom')}
  0x00007f09c84c8e02:   jne    0x00007f09c84c94ec
  0x00007f09c84c8e08:   mov    %rsi,%rax
  0x00007f09c84c8e0b:   mov    0x10(%rax),%r10d
  0x00007f09c84c8e0f:   mov    0x18(%rax),%esi
  0x00007f09c84c8e12:   mov    0x14(%rax),%r11d
  0x00007f09c84c8e16:   mov    0xc(%rax),%ecx
  0x00007f09c84c8e19:   add    $0xfffffffe,%ebp
  0x00007f09c84c8e1c:   movslq %r9d,%r13
  0x00007f09c84c8e1f:   vmovq  %rdx,%xmm3
  0x00007f09c84c8e24:   mov    %r9d,0xc(%rsp)
  0x00007f09c84c8e29:   vmovd  %r8d,%xmm1
  0x00007f09c84c8e2e:   mov    %ecx,(%rsp)
  0x00007f09c84c8e31:   mov    %r13,0x10(%rsp)
  0x00007f09c84c8e36:   xor    %r11d,%esi
  0x00007f09c84c8e39:   lea    (%r11,%r10,1),%ecx
  0x00007f09c84c8e3d:   lea    0x7(%rdi),%r14d
  0x00007f09c84c8e41:   mov    %ecx,%r9d
  0x00007f09c84c8e44:   shr    $0x10,%r9d
  0x00007f09c84c8e48:   xor    %ecx,%r9d
  0x00007f09c84c8e4b:   movslq %r14d,%rdx
  0x00007f09c84c8e4e:   imul   $0xd36d884b,%r9d,%r9d
  0x00007f09c84c8e55:   add    $0xfffffffffffffff9,%rdx
  0x00007f09c84c8e59:   mov    %r9d,%ecx
  0x00007f09c84c8e5c:   shr    $0x10,%ecx
  0x00007f09c84c8e5f:   xor    %r9d,%ecx
  0x00007f09c84c8e62:   imul   $0xadb4a92d,%r10d,%r9d
  0x00007f09c84c8e69:   add    (%rsp),%r9d
  0x00007f09c84c8e6d:   imul   $0xd36d884b,%ecx,%ebx
  0x00007f09c84c8e73:   imul   $0xadb4a92d,%r9d,%r10d
  0x00007f09c84c8e7a:   add    (%rsp),%r10d
  0x00007f09c84c8e7e:   mov    %r10d,0x10(%rax)
  0x00007f09c84c8e82:   mov    %ebx,%r8d
  0x00007f09c84c8e85:   shr    $0x10,%r8d
  0x00007f09c84c8e89:   xor    %ebx,%r8d
  0x00007f09c84c8e8c:   mov    %esi,%ebx
  0x00007f09c84c8e8e:   shl    $0x9,%ebx
  0x00007f09c84c8e91:   movslq %r8d,%rcx
  0x00007f09c84c8e94:   rorx   $0x6,%r11d,%r8d
  0x00007f09c84c8e9a:   xor    %esi,%r8d
  0x00007f09c84c8e9d:   xor    %ebx,%r8d
  0x00007f09c84c8ea0:   add    %r8d,%r9d
  0x00007f09c84c8ea3:   shl    $0x20,%rcx
  0x00007f09c84c8ea7:   mov    %r9d,%ebx
  0x00007f09c84c8eaa:   shr    $0x10,%ebx
  0x00007f09c84c8ead:   xor    %r9d,%ebx
  0x00007f09c84c8eb0:   rorx   $0x6,%r8d,%r11d
  0x00007f09c84c8eb6:   imul   $0xd36d884b,%ebx,%r9d
  0x00007f09c84c8ebd:   rorx   $0x13,%esi,%ebx
  0x00007f09c84c8ec3:   xor    %r8d,%ebx
  0x00007f09c84c8ec6:   xor    %ebx,%r11d
  0x00007f09c84c8ec9:   mov    %r9d,%r8d
  0x00007f09c84c8ecc:   shr    $0x10,%r8d
  0x00007f09c84c8ed0:   xor    %r9d,%r8d
  0x00007f09c84c8ed3:   rorx   $0x13,%ebx,%esi
  0x00007f09c84c8ed9:   mov    %esi,0x18(%rax)
  0x00007f09c84c8edc:   imul   $0xd36d884b,%r8d,%r9d
  0x00007f09c84c8ee3:   shl    $0x9,%ebx
  0x00007f09c84c8ee6:   xor    %ebx,%r11d
  0x00007f09c84c8ee9:   mov    %r11d,0x14(%rax)
  0x00007f09c84c8eed:   mov    %r9d,%ebx
  0x00007f09c84c8ef0:   shr    $0x10,%ebx
  0x00007f09c84c8ef3:   xor    %r9d,%ebx
  0x00007f09c84c8ef6:   movslq %ebx,%r8
  0x00007f09c84c8ef9:   xor    %rcx,%r8                     ;   {no_reloc}
  0x00007f09c84c8efc:   cmp    0x10(%rsp),%rdx
  0x00007f09c84c8f01:   jae    0x00007f09c84c94d4
  0x00007f09c84c8f07:   cmp    0xc(%rsp),%r14d
  0x00007f09c84c8f0c:   jae    0x00007f09c84c94db
  0x00007f09c84c8f12:   mov    0x458(%r15),%rcx
  0x00007f09c84c8f19:   movslq %edi,%rbx
  0x00007f09c84c8f1c:   mov    %r8d,%r9d
  0x00007f09c84c8f1f:   vmovq  %xmm3,%rdx
  0x00007f09c84c8f24:   mov    %r9b,0x10(%rdx,%rdi,1)
  0x00007f09c84c8f29:   shr    $0x8,%r8
  0x00007f09c84c8f2d:   mov    %r8d,%r9d
  0x00007f09c84c8f30:   mov    %r9b,0x11(%rdx,%rbx,1)
  0x00007f09c84c8f35:   inc    %r14d
  0x00007f09c84c8f38:   shr    $0x8,%r8
  0x00007f09c84c8f3c:   mov    %r8,%rdi
  0x00007f09c84c8f3f:   shr    $0x8,%rdi
  0x00007f09c84c8f43:   mov    %r8d,%r9d
  0x00007f09c84c8f46:   mov    %r9b,0x12(%rdx,%rbx,1)
  0x00007f09c84c8f4b:   mov    %edi,%r9d
  0x00007f09c84c8f4e:   mov    %r9b,0x13(%rdx,%rbx,1)
  0x00007f09c84c8f53:   shr    $0x8,%rdi
  0x00007f09c84c8f57:   mov    %rdi,%rdx
  0x00007f09c84c8f5a:   shr    $0x8,%rdx
  0x00007f09c84c8f5e:   mov    %edi,%r8d
  0x00007f09c84c8f61:   vmovq  %xmm3,%r9
  0x00007f09c84c8f66:   mov    %r8b,0x14(%r9,%rbx,1)
  0x00007f09c84c8f6b:   mov    %edx,%r9d
  0x00007f09c84c8f6e:   vmovq  %xmm3,%r8
  0x00007f09c84c8f73:   mov    %r9b,0x15(%r8,%rbx,1)
  0x00007f09c84c8f78:   shr    $0x8,%rdx
  0x00007f09c84c8f7c:   mov    %rdx,%r9
  0x00007f09c84c8f7f:   shr    $0x8,%r9
  0x00007f09c84c8f83:   mov    %edx,%r8d
  0x00007f09c84c8f86:   vmovq  %xmm3,%rdi
  0x00007f09c84c8f8b:   mov    %r8b,0x16(%rdi,%rbx,1)
  0x00007f09c84c8f90:   mov    %r9d,%r9d
  0x00007f09c84c8f93:   mov    %r9b,0x17(%rdi,%rbx,1)       ; ImmutableOopMap {rdi=Oop rax=Oop xmm0=Oop xmm3=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@58 (line 488)
  0x00007f09c84c8f98:   test   %eax,(%rcx)                  ;   {poll}
  0x00007f09c84c8f9a:   vmovd  %xmm1,%r8d
  0x00007f09c84c8f9f:   dec    %r8d
  0x00007f09c84c8fa2:   cmp    %ebp,%r8d
  0x00007f09c84c8fa5:   jle    0x00007f09c84c8fb4
  0x00007f09c84c8fa7:   vmovd  %r8d,%xmm1
  0x00007f09c84c8fac:   mov    %r14d,%edi
  0x00007f09c84c8faf:   jmp    0x00007f09c84c8e36
  0x00007f09c84c8fb4:   test   %r8d,%r8d
  0x00007f09c84c8fb7:   jle    0x00007f09c84c94e2
  0x00007f09c84c8fbd:   vmovd  %xmm1,%r9d
  0x00007f09c84c8fc2:   dec    %r9d
  0x00007f09c84c8fc5:   vmovd  %r9d,%xmm2
  0x00007f09c84c8fca:   jmp    0x00007f09c84c8fd5
  0x00007f09c84c8fcc:   nopl   0x0(%rax)
  0x00007f09c84c8fd0:   vmovq  %xmm1,%rax
  0x00007f09c84c8fd5:   lea    (%r11,%r10,1),%edi
  0x00007f09c84c8fd9:   xor    %r11d,%esi
  0x00007f09c84c8fdc:   lea    0x7(%r14),%ecx
  0x00007f09c84c8fe0:   rorx   $0x13,%esi,%edx
  0x00007f09c84c8fe6:   movslq %ecx,%r9
  0x00007f09c84c8fe9:   mov    %esi,%ebp
  0x00007f09c84c8feb:   shl    $0x9,%ebp
  0x00007f09c84c8fee:   add    $0xfffffffffffffff9,%r9
  0x00007f09c84c8ff2:   mov    %edi,%r8d
  0x00007f09c84c8ff5:   shr    $0x10,%r8d
  0x00007f09c84c8ff9:   xor    %edi,%r8d
  0x00007f09c84c8ffc:   imul   $0xadb4a92d,%r10d,%edi
  0x00007f09c84c9003:   add    (%rsp),%edi
  0x00007f09c84c9006:   imul   $0xd36d884b,%r8d,%r8d
  0x00007f09c84c900d:   imul   $0xadb4a92d,%edi,%ebx
  0x00007f09c84c9013:   add    (%rsp),%ebx
  0x00007f09c84c9016:   mov    %ebx,0x10(%rax)
  0x00007f09c84c9019:   mov    %r8d,%r10d
  0x00007f09c84c901c:   shr    $0x10,%r10d
  0x00007f09c84c9020:   xor    %r8d,%r10d
  0x00007f09c84c9023:   rorx   $0x6,%r11d,%r11d
  0x00007f09c84c9029:   xor    %esi,%r11d
  0x00007f09c84c902c:   xor    %ebp,%r11d
  0x00007f09c84c902f:   add    %r11d,%edi
  0x00007f09c84c9032:   xor    %r11d,%edx
  0x00007f09c84c9035:   imul   $0xd36d884b,%r10d,%ebp
  0x00007f09c84c903c:   rorx   $0x13,%edx,%r8d
  0x00007f09c84c9042:   mov    %r8d,0x18(%rax)
  0x00007f09c84c9046:   mov    %ebp,%r10d
  0x00007f09c84c9049:   shr    $0x10,%r10d
  0x00007f09c84c904d:   xor    %ebp,%r10d
  0x00007f09c84c9050:   mov    %edx,%r13d
  0x00007f09c84c9053:   shl    $0x9,%r13d
  0x00007f09c84c9057:   movslq %r10d,%rsi
  0x00007f09c84c905a:   mov    %edi,%ebp
  0x00007f09c84c905c:   shr    $0x10,%ebp
  0x00007f09c84c905f:   xor    %edi,%ebp
  0x00007f09c84c9061:   shl    $0x20,%rsi
  0x00007f09c84c9065:   imul   $0xd36d884b,%ebp,%r10d
  0x00007f09c84c906c:   rorx   $0x6,%r11d,%ebp
  0x00007f09c84c9072:   xor    %edx,%ebp
  0x00007f09c84c9074:   xor    %r13d,%ebp
  0x00007f09c84c9077:   mov    %ebp,0x14(%rax)
  0x00007f09c84c907a:   mov    %r10d,%r11d
  0x00007f09c84c907d:   shr    $0x10,%r11d
  0x00007f09c84c9081:   xor    %r10d,%r11d
  0x00007f09c84c9084:   imul   $0xd36d884b,%r11d,%r11d
  0x00007f09c84c908b:   mov    %r11d,%r10d
  0x00007f09c84c908e:   shr    $0x10,%r10d
  0x00007f09c84c9092:   xor    %r11d,%r10d
  0x00007f09c84c9095:   movslq %r10d,%rdx                   ;   {no_reloc}
  0x00007f09c84c9098:   xor    %rsi,%rdx
  0x00007f09c84c909b:   cmp    0x10(%rsp),%r9
  0x00007f09c84c90a0:   jae    0x00007f09c84c9476
  0x00007f09c84c90a6:   cmp    0xc(%rsp),%ecx
  0x00007f09c84c90aa:   jae    0x00007f09c84c9490
  0x00007f09c84c90b0:   lea    (%rbx,%rbp,1),%edi
  0x00007f09c84c90b3:   mov    %ebp,%r10d
  0x00007f09c84c90b6:   xor    %r8d,%r10d
  0x00007f09c84c90b9:   vmovd  %xmm2,%r9d
  0x00007f09c84c90be:   dec    %r9d
  0x00007f09c84c90c1:   rorx   $0x13,%r10d,%r8d
  0x00007f09c84c90c7:   mov    %r10d,%esi
  0x00007f09c84c90ca:   shl    $0x9,%esi
  0x00007f09c84c90cd:   rorx   $0x6,%ebp,%ebp
  0x00007f09c84c90d3:   xor    %r10d,%ebp
  0x00007f09c84c90d6:   xor    %esi,%ebp
  0x00007f09c84c90d8:   xor    %ebp,%r8d
  0x00007f09c84c90db:   mov    %edi,%r11d
  0x00007f09c84c90de:   shr    $0x10,%r11d
  0x00007f09c84c90e2:   xor    %edi,%r11d
  0x00007f09c84c90e5:   rorx   $0x13,%r8d,%esi
  0x00007f09c84c90eb:   mov    %esi,0x18(%rax)
  0x00007f09c84c90ee:   imul   $0xd36d884b,%r11d,%r11d
  0x00007f09c84c90f5:   mov    %r8d,%edi
  0x00007f09c84c90f8:   shl    $0x9,%edi
  0x00007f09c84c90fb:   mov    %r11d,%r10d
  0x00007f09c84c90fe:   shr    $0x10,%r10d
  0x00007f09c84c9102:   xor    %r11d,%r10d
  0x00007f09c84c9105:   rorx   $0x6,%ebp,%r11d
  0x00007f09c84c910b:   xor    %r8d,%r11d
  0x00007f09c84c910e:   xor    %edi,%r11d
  0x00007f09c84c9111:   mov    %r11d,0x14(%rax)
  0x00007f09c84c9115:   imul   $0xd36d884b,%r10d,%edi
  0x00007f09c84c911c:   imul   $0xadb4a92d,%ebx,%r10d
  0x00007f09c84c9123:   add    (%rsp),%r10d
  0x00007f09c84c9127:   lea    (%r10,%rbp,1),%ebx
  0x00007f09c84c912b:   mov    %edi,%r8d
  0x00007f09c84c912e:   shr    $0x10,%r8d
  0x00007f09c84c9132:   xor    %edi,%r8d
  0x00007f09c84c9135:   mov    %ebx,%ebp
  0x00007f09c84c9137:   shr    $0x10,%ebp
  0x00007f09c84c913a:   xor    %ebx,%ebp
  0x00007f09c84c913c:   movslq %r8d,%r8
  0x00007f09c84c913f:   imul   $0xd36d884b,%ebp,%edi
  0x00007f09c84c9145:   shl    $0x20,%r8
  0x00007f09c84c9149:   mov    %edi,%ebp
  0x00007f09c84c914b:   shr    $0x10,%ebp
  0x00007f09c84c914e:   xor    %edi,%ebp
  0x00007f09c84c9150:   imul   $0xadb4a92d,%r10d,%r10d
  0x00007f09c84c9157:   add    (%rsp),%r10d
  0x00007f09c84c915b:   mov    %r10d,0x10(%rax)
  0x00007f09c84c915f:   vmovq  %rax,%xmm1
  0x00007f09c84c9164:   imul   $0xd36d884b,%ebp,%eax
  0x00007f09c84c916a:   mov    %edx,%ebx
  0x00007f09c84c916c:   vmovq  %xmm3,%rdi
  0x00007f09c84c9171:   mov    %bl,0x10(%rdi,%r14,1)
  0x00007f09c84c9176:   mov    %eax,%edi
  0x00007f09c84c9178:   shr    $0x10,%edi
  0x00007f09c84c917b:   xor    %eax,%edi
  0x00007f09c84c917d:   shr    $0x8,%rdx
  0x00007f09c84c9181:   movslq %edi,%rbx
  0x00007f09c84c9184:   xor    %rbx,%r8
  0x00007f09c84c9187:   mov    %edx,%edi
  0x00007f09c84c9189:   shr    $0x8,%rdx
  0x00007f09c84c918d:   movslq %r14d,%rax
  0x00007f09c84c9190:   vmovq  %xmm3,%rbx
  0x00007f09c84c9195:   mov    %dil,0x11(%rbx,%rax,1)       ;   {no_reloc}
  0x00007f09c84c919a:   mov    %edx,%ebx
  0x00007f09c84c919c:   vmovq  %xmm3,%rdi
  0x00007f09c84c91a1:   mov    %bl,0x12(%rdi,%rax,1)
  0x00007f09c84c91a5:   shr    $0x8,%rdx
  0x00007f09c84c91a9:   mov    %edx,%edi
  0x00007f09c84c91ab:   vmovq  %xmm3,%rbx
  0x00007f09c84c91b0:   mov    %dil,0x13(%rbx,%rax,1)
  0x00007f09c84c91b5:   lea    0x1(%rcx),%edi
  0x00007f09c84c91b8:   lea    0x8(%rcx),%r14d
  0x00007f09c84c91bc:   shr    $0x8,%rdx
  0x00007f09c84c91c0:   movslq %r14d,%rbx
  0x00007f09c84c91c3:   mov    %edx,%ebp
  0x00007f09c84c91c5:   vmovq  %xmm3,%r13
  0x00007f09c84c91ca:   mov    %bpl,0x14(%r13,%rax,1)
  0x00007f09c84c91cf:   add    $0xfffffffffffffff9,%rbx
  0x00007f09c84c91d3:   shr    $0x8,%rdx
  0x00007f09c84c91d7:   mov    %rdx,%r13
  0x00007f09c84c91da:   shr    $0x8,%r13
  0x00007f09c84c91de:   mov    %edx,%ebp
  0x00007f09c84c91e0:   vmovq  %xmm3,%rdx
  0x00007f09c84c91e5:   mov    %bpl,0x15(%rdx,%rax,1)
  0x00007f09c84c91ea:   mov    %r13d,%ebp
  0x00007f09c84c91ed:   mov    %bpl,0x16(%rdx,%rax,1)
  0x00007f09c84c91f2:   shr    $0x8,%r13
  0x00007f09c84c91f6:   mov    %r13d,%ebp
  0x00007f09c84c91f9:   mov    %bpl,0x17(%rdx,%rax,1)
  0x00007f09c84c91fe:   cmp    0x10(%rsp),%rbx
  0x00007f09c84c9203:   jae    0x00007f09c84c9483
  0x00007f09c84c9209:   cmp    0xc(%rsp),%r14d
  0x00007f09c84c920e:   jae    0x00007f09c84c949d
  0x00007f09c84c9214:   mov    0x458(%r15),%rbx
  0x00007f09c84c921b:   movslq %ecx,%rdx
  0x00007f09c84c921e:   mov    %r8d,%edi
  0x00007f09c84c9221:   vmovq  %xmm3,%rcx
  0x00007f09c84c9226:   mov    %dil,0x11(%rcx,%rdx,1)
  0x00007f09c84c922b:   shr    $0x8,%r8
  0x00007f09c84c922f:   mov    %r8d,%ecx
  0x00007f09c84c9232:   vmovq  %xmm3,%rdi
  0x00007f09c84c9237:   mov    %cl,0x12(%rdi,%rdx,1)
  0x00007f09c84c923b:   inc    %r14d
  0x00007f09c84c923e:   shr    $0x8,%r8
  0x00007f09c84c9242:   mov    %r8,%rdi
  0x00007f09c84c9245:   shr    $0x8,%rdi
  0x00007f09c84c9249:   mov    %r8d,%r8d
  0x00007f09c84c924c:   vmovq  %xmm3,%rcx
  0x00007f09c84c9251:   mov    %r8b,0x13(%rcx,%rdx,1)
  0x00007f09c84c9256:   mov    %edi,%ecx
  0x00007f09c84c9258:   vmovq  %xmm3,%r8
  0x00007f09c84c925d:   mov    %cl,0x14(%r8,%rdx,1)
  0x00007f09c84c9262:   shr    $0x8,%rdi
  0x00007f09c84c9266:   mov    %rdi,%rax
  0x00007f09c84c9269:   shr    $0x8,%rax
  0x00007f09c84c926d:   mov    %edi,%r8d
  0x00007f09c84c9270:   vmovq  %xmm3,%rcx
  0x00007f09c84c9275:   mov    %r8b,0x15(%rcx,%rdx,1)
  0x00007f09c84c927a:   mov    %eax,%ecx
  0x00007f09c84c927c:   vmovq  %xmm3,%r8
  0x00007f09c84c9281:   mov    %cl,0x16(%r8,%rdx,1)
  0x00007f09c84c9286:   shr    $0x8,%rax
  0x00007f09c84c928a:   mov    %rax,%rcx
  0x00007f09c84c928d:   shr    $0x8,%rcx
  0x00007f09c84c9291:   mov    %eax,%r8d
  0x00007f09c84c9294:   vmovq  %xmm3,%rdi                   ;   {no_reloc}
  0x00007f09c84c9299:   mov    %r8b,0x17(%rdi,%rdx,1)
  0x00007f09c84c929e:   mov    %ecx,%ecx
  0x00007f09c84c92a0:   mov    %cl,0x18(%rdi,%rdx,1)        ; ImmutableOopMap {rdi=Oop xmm0=Oop xmm1=Oop xmm3=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@58 (line 488)
  0x00007f09c84c92a4:   test   %eax,(%rbx)                  ;   {poll}
  0x00007f09c84c92a6:   vmovd  %xmm2,%r9d
  0x00007f09c84c92ab:   add    $0xfffffffe,%r9d
  0x00007f09c84c92af:   vmovd  %r9d,%xmm2
  0x00007f09c84c92b4:   test   %r9d,%r9d
  0x00007f09c84c92b7:   jg     0x00007f09c84c8fd0
  0x00007f09c84c92bd:   vmovq  %xmm1,%rax
  0x00007f09c84c92c2:   vmovd  %xmm2,%r8d
  0x00007f09c84c92c7:   cmp    $0xffffffff,%r8d
  0x00007f09c84c92cb:   jle    0x00007f09c84c944d
  0x00007f09c84c92d1:   mov    %r8d,%r13d
  0x00007f09c84c92d4:   lea    (%r11,%r10,1),%r9d
  0x00007f09c84c92d8:   xor    %r11d,%esi
  0x00007f09c84c92db:   lea    0x7(%r14),%edi
  0x00007f09c84c92df:   rorx   $0x13,%esi,%ebx
  0x00007f09c84c92e5:   movslq %edi,%rdx
  0x00007f09c84c92e8:   mov    %esi,%r8d
  0x00007f09c84c92eb:   shl    $0x9,%r8d
  0x00007f09c84c92ef:   add    $0xfffffffffffffff9,%rdx
  0x00007f09c84c92f3:   rorx   $0x6,%r11d,%ebp
  0x00007f09c84c92f9:   xor    %esi,%ebp
  0x00007f09c84c92fb:   xor    %r8d,%ebp
  0x00007f09c84c92fe:   xor    %ebp,%ebx
  0x00007f09c84c9300:   imul   $0xadb4a92d,%r10d,%r11d
  0x00007f09c84c9307:   add    (%rsp),%r11d
  0x00007f09c84c930b:   lea    (%r11,%rbp,1),%ecx
  0x00007f09c84c930f:   rorx   $0x13,%ebx,%esi
  0x00007f09c84c9315:   mov    %esi,0x18(%rax)
  0x00007f09c84c9318:   mov    %ecx,%r8d
  0x00007f09c84c931b:   shr    $0x10,%r8d
  0x00007f09c84c931f:   xor    %ecx,%r8d
  0x00007f09c84c9322:   imul   $0xadb4a92d,%r11d,%r10d
  0x00007f09c84c9329:   add    (%rsp),%r10d
  0x00007f09c84c932d:   mov    %r10d,0x10(%rax)
  0x00007f09c84c9331:   imul   $0xd36d884b,%r8d,%r11d
  0x00007f09c84c9338:   mov    %ebx,%r8d
  0x00007f09c84c933b:   shl    $0x9,%r8d
  0x00007f09c84c933f:   mov    %r11d,%ecx
  0x00007f09c84c9342:   shr    $0x10,%ecx
  0x00007f09c84c9345:   xor    %r11d,%ecx
  0x00007f09c84c9348:   rorx   $0x6,%ebp,%r11d
  0x00007f09c84c934e:   xor    %ebx,%r11d
  0x00007f09c84c9351:   xor    %r8d,%r11d
  0x00007f09c84c9354:   mov    %r11d,0x14(%rax)
  0x00007f09c84c9358:   imul   $0xd36d884b,%ecx,%r8d
  0x00007f09c84c935f:   mov    %r9d,%ebx
  0x00007f09c84c9362:   shr    $0x10,%ebx
  0x00007f09c84c9365:   xor    %r9d,%ebx
  0x00007f09c84c9368:   mov    %r8d,%r9d
  0x00007f09c84c936b:   shr    $0x10,%r9d
  0x00007f09c84c936f:   xor    %r8d,%r9d
  0x00007f09c84c9372:   imul   $0xd36d884b,%ebx,%ebx
  0x00007f09c84c9378:   movslq %r9d,%r8
  0x00007f09c84c937b:   mov    %ebx,%r9d
  0x00007f09c84c937e:   shr    $0x10,%r9d
  0x00007f09c84c9382:   xor    %ebx,%r9d
  0x00007f09c84c9385:   imul   $0xd36d884b,%r9d,%ecx
  0x00007f09c84c938c:   mov    %ecx,%r9d
  0x00007f09c84c938f:   shr    $0x10,%r9d
  0x00007f09c84c9393:   xor    %ecx,%r9d
  0x00007f09c84c9396:   movslq %r9d,%r9
  0x00007f09c84c9399:   shl    $0x20,%r9
  0x00007f09c84c939d:   xor    %r9,%r8
  0x00007f09c84c93a0:   cmp    0x10(%rsp),%rdx              ;   {no_reloc}
  0x00007f09c84c93a5:   jae    0x00007f09c84c94a8
  0x00007f09c84c93ab:   cmp    0xc(%rsp),%edi
  0x00007f09c84c93af:   jae    0x00007f09c84c94a8
  0x00007f09c84c93b5:   mov    0x458(%r15),%rcx
  0x00007f09c84c93bc:   mov    %r8d,%r9d
  0x00007f09c84c93bf:   vmovq  %xmm3,%rbx
  0x00007f09c84c93c4:   mov    %r9b,0x10(%rbx,%r14,1)
  0x00007f09c84c93c9:   inc    %edi
  0x00007f09c84c93cb:   shr    $0x8,%r8
  0x00007f09c84c93cf:   movslq %r14d,%rdx
  0x00007f09c84c93d2:   mov    %r8d,%ebx
  0x00007f09c84c93d5:   vmovq  %xmm3,%r9
  0x00007f09c84c93da:   mov    %bl,0x11(%r9,%rdx,1)
  0x00007f09c84c93df:   shr    $0x8,%r8
  0x00007f09c84c93e3:   mov    %r8,%rbx
  0x00007f09c84c93e6:   shr    $0x8,%rbx
  0x00007f09c84c93ea:   mov    %r8d,%r8d
  0x00007f09c84c93ed:   mov    %r8b,0x12(%r9,%rdx,1)
  0x00007f09c84c93f2:   mov    %ebx,%r9d
  0x00007f09c84c93f5:   vmovq  %xmm3,%r8
  0x00007f09c84c93fa:   mov    %r9b,0x13(%r8,%rdx,1)
  0x00007f09c84c93ff:   shr    $0x8,%rbx
  0x00007f09c84c9403:   mov    %rbx,%r9
  0x00007f09c84c9406:   shr    $0x8,%r9
  0x00007f09c84c940a:   mov    %ebx,%r8d
  0x00007f09c84c940d:   vmovq  %xmm3,%rbx
  0x00007f09c84c9412:   mov    %r8b,0x14(%rbx,%rdx,1)
  0x00007f09c84c9417:   mov    %r9d,%r8d
  0x00007f09c84c941a:   mov    %r8b,0x15(%rbx,%rdx,1)
  0x00007f09c84c941f:   shr    $0x8,%r9
  0x00007f09c84c9423:   mov    %r9,%r8
  0x00007f09c84c9426:   shr    $0x8,%r8
  0x00007f09c84c942a:   mov    %r9d,%r9d
  0x00007f09c84c942d:   mov    %r9b,0x16(%rbx,%rdx,1)
  0x00007f09c84c9432:   mov    %r8d,%r8d
  0x00007f09c84c9435:   mov    %r8b,0x17(%rbx,%rdx,1)       ; ImmutableOopMap {rbx=Oop rax=Oop xmm0=Oop xmm3=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@58 (line 488)
  0x00007f09c84c943a:   test   %eax,(%rcx)                  ;   {poll}
  0x00007f09c84c943c:   dec    %r13d
  0x00007f09c84c943f:   cmp    $0xffffffff,%r13d
  0x00007f09c84c9443:   jle    0x00007f09c84c9450
  0x00007f09c84c9445:   mov    %edi,%r14d
  0x00007f09c84c9448:   jmp    0x00007f09c84c92d4
  0x00007f09c84c944d:   mov    %r14d,%edi
  0x00007f09c84c9450:   vmovq  %xmm3,%rdx
  0x00007f09c84c9455:   mov    0xc(%rsp),%r9d
  0x00007f09c84c945a:   cmp    %r9d,%edi
  0x00007f09c84c945d:   jl     0x00007f09c84c9514
  0x00007f09c84c9463:   add    $0x30,%rsp
  0x00007f09c84c9467:   pop    %rbp
  0x00007f09c84c9468:   cmp    0x450(%r15),%rsp             ;   {poll_return}
  0x00007f09c84c946f:   ja     0x00007f09c84c954c
  0x00007f09c84c9475:   ret
  0x00007f09c84c9476:   mov    %rdx,%r8
  0x00007f09c84c9479:   vmovd  %xmm2,%r9d
  0x00007f09c84c947e:   mov    %r14d,%edi
  0x00007f09c84c9481:   jmp    0x00007f09c84c9488
  0x00007f09c84c9483:   vmovq  %xmm1,%rax
  0x00007f09c84c9488:   mov    %r9d,%r13d
  0x00007f09c84c948b:   mov    %edi,%r14d
  0x00007f09c84c948e:   jmp    0x00007f09c84c94a8
  0x00007f09c84c9490:   mov    %rdx,%r8
  0x00007f09c84c9493:   vmovd  %xmm2,%r9d
  0x00007f09c84c9498:   mov    %r14d,%edi
  0x00007f09c84c949b:   jmp    0x00007f09c84c94a2
  0x00007f09c84c949d:   vmovq  %xmm1,%rax
  0x00007f09c84c94a2:   mov    %r9d,%r13d
  0x00007f09c84c94a5:   mov    %edi,%r14d
  0x00007f09c84c94a8:   mov    $0xffffff76,%esi
  0x00007f09c84c94ad:   mov    %rax,%rbp
  0x00007f09c84c94b0:   vmovsd %xmm3,(%rsp)
  0x00007f09c84c94b5:   mov    %r14d,0x8(%rsp)
  0x00007f09c84c94ba:   mov    %r13d,0x10(%rsp)
  0x00007f09c84c94bf:   mov    %r8,0x18(%rsp)
  0x00007f09c84c94c4:   data16 xchg %ax,%ax
  0x00007f09c84c94c7:   call   0x00007f09c7da9c00           ; ImmutableOopMap {rbp=Oop [0]=Oop }
                                                            ;*ifle {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@35 (line 486)
                                                            ;   {runtime_call UncommonTrapBlob}
  0x00007f09c84c94cc:   nopl   0x30008bc(%rax,%rax,1)       ;   {other}
  0x00007f09c84c94d4:   vmovd  %xmm1,%r9d
  0x00007f09c84c94d9:   jmp    0x00007f09c84c9488
  0x00007f09c84c94db:   vmovd  %xmm1,%r9d
  0x00007f09c84c94e0:   jmp    0x00007f09c84c94a2
  0x00007f09c84c94e2:   vmovd  %r8d,%xmm2
  0x00007f09c84c94e7:   jmp    0x00007f09c84c92c2
  0x00007f09c84c94ec:   mov    $0xffffff76,%esi
  0x00007f09c84c94f1:   mov    %rdx,(%rsp)
  0x00007f09c84c94f5:   mov    %r9d,0x8(%rsp)
  0x00007f09c84c94fa:   mov    %r8d,0xc(%rsp)
  0x00007f09c84c94ff:   vmovsd %xmm0,0x10(%rsp)
  0x00007f09c84c9505:   xchg   %ax,%ax
  0x00007f09c84c9507:   call   0x00007f09c7da9c00           ; ImmutableOopMap {[0]=Oop [16]=Oop }
                                                            ;*ifle {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@15 (line 484)
                                                            ;   {runtime_call UncommonTrapBlob}
  0x00007f09c84c950c:   nopl   0x40008fc(%rax,%rax,1)       ;   {other}
  0x00007f09c84c9514:   mov    $0xffffff45,%esi
  0x00007f09c84c9519:   mov    %rdx,%rbp
  0x00007f09c84c951c:   mov    %edi,0x8(%rsp)
  0x00007f09c84c9520:   mov    %r9d,0xc(%rsp)
  0x00007f09c84c9525:   vmovsd %xmm0,0x10(%rsp)
  0x00007f09c84c952b:   call   0x00007f09c7da9c00           ; ImmutableOopMap {rbp=Oop [16]=Oop }
                                                            ;*if_icmpge {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@63 (line 489)
                                                            ;   {runtime_call UncommonTrapBlob}
  0x00007f09c84c9530:   nopl   0x5000920(%rax,%rax,1)       ;   {other}
  0x00007f09c84c9538:   mov    $0xfffffff6,%esi
  0x00007f09c84c953d:   xchg   %ax,%ax
  0x00007f09c84c953f:   call   0x00007f09c7da9c00           ; ImmutableOopMap {}
                                                            ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - java.util.random.RandomGenerator::nextBytes@3 (line 483)
                                                            ;   {runtime_call UncommonTrapBlob}
  0x00007f09c84c9544:   nopl   0x6000934(%rax,%rax,1)       ;   {other}
  0x00007f09c84c954c:   movabs $0x7f09c84c9468,%r10         ;   {internal_word}
  0x00007f09c84c9556:   mov    %r10,0x468(%r15)
  0x00007f09c84c955d:   jmp    0x00007f09c7daad00           ;   {runtime_call SafepointBlob}
  0x00007f09c84c9562:   call   Stub::nmethod_entry_barrier  ;   {runtime_call StubRoutines (final stubs)}
  0x00007f09c84c9567:   jmp    0x00007f09c84c8dda
  0x00007f09c84c956c:   hlt
  0x00007f09c84c956d:   hlt
  0x00007f09c84c956e:   hlt
  0x00007f09c84c956f:   hlt
[Exception Handler]
  0x00007f09c84c9570:   jmp    0x00007f09c7e6b100           ;   {no_reloc}
[Deopt Handler Code]
  0x00007f09c84c9575:   call   0x00007f09c84c957a
  0x00007f09c84c957a:   subq   $0x5,(%rsp)
  0x00007f09c84c957f:   jmp    0x00007f09c7da9fa0           ;   {runtime_call DeoptimizationBlob}
  0x00007f09c84c9584:   hlt
  0x00007f09c84c9585:   hlt
  0x00007f09c84c9586:   hlt
  0x00007f09c84c9587:   hlt
--------------------------------------------------------------------------------
[/Disassembly]

(I'm not familiar with assembly) I guess loop unrolling is working?

@liach
Copy link
Member

liach commented Jun 25, 2023

I'm not sure too, but there's vmovd and vmovq which are moving double or quad words at once, so it appears vectorized. But using Unsafe/ByteArrayLittleEndian explicitly still seems better optimized from your results; I guess it might be because that you know the input random number size (int/long sizes). Can you try how putLongUnaligned etc. work, as VH implementation delegates to the unaligned versions (for plain get/set)?

@Glavo
Copy link
Contributor Author

Glavo commented Jun 25, 2023

Use Unsafe::putIntUnaligned/Unsafe::putLongUnaligned:

Results
Benchmark                        (length)   Mode  Cnt        Score       Error   Units
RandomBenchmark.L32X64MixRandom         0  thrpt    5  1524860.609 ± 25736.732  ops/ms
RandomBenchmark.L32X64MixRandom         1  thrpt    5   215406.292 ±  2337.313  ops/ms
RandomBenchmark.L32X64MixRandom         2  thrpt    5   201345.579 ±   940.754  ops/ms
RandomBenchmark.L32X64MixRandom         3  thrpt    5   191355.115 ±  1805.205  ops/ms
RandomBenchmark.L32X64MixRandom         4  thrpt    5   184609.621 ±  1255.979  ops/ms
RandomBenchmark.L32X64MixRandom         5  thrpt    5   164468.663 ±  1677.559  ops/ms
RandomBenchmark.L32X64MixRandom         6  thrpt    5   156960.655 ±   663.464  ops/ms
RandomBenchmark.L32X64MixRandom         7  thrpt    5   153595.183 ±  3983.234  ops/ms
RandomBenchmark.L32X64MixRandom         8  thrpt    5   186632.617 ±   425.385  ops/ms
RandomBenchmark.L32X64MixRandom        10  thrpt    5   104736.408 ±   345.176  ops/ms
RandomBenchmark.L32X64MixRandom        12  thrpt    5   105447.874 ±   399.328  ops/ms
RandomBenchmark.L32X64MixRandom        14  thrpt    5    95664.265 ±    80.052  ops/ms
RandomBenchmark.L32X64MixRandom        16  thrpt    5   109343.697 ±    32.207  ops/ms
RandomBenchmark.L32X64MixRandom        32  thrpt    5    62252.931 ±   469.271  ops/ms
RandomBenchmark.L32X64MixRandom        64  thrpt    5    31358.265 ±    89.965  ops/ms
RandomBenchmark.L32X64MixRandom       128  thrpt    5    16607.450 ±    70.292  ops/ms
RandomBenchmark.L32X64MixRandom       256  thrpt    5     8327.905 ±     9.349  ops/ms
RandomBenchmark.L32X64MixRandom       512  thrpt    5     4379.807 ±     9.959  ops/ms
RandomBenchmark.L32X64MixRandom      1024  thrpt    5     2169.190 ±     0.127  ops/ms
RandomBenchmark.L32X64MixRandom      2048  thrpt    5     1081.397 ±    64.131  ops/ms
RandomBenchmark.L32X64MixRandom      4096  thrpt    5      546.185 ±     0.895  ops/ms
RandomBenchmark.L32X64MixRandom      8192  thrpt    5      273.206 ±     0.236  ops/ms
RandomBenchmark.Random                  0  thrpt    5  1523782.776 ± 11592.739  ops/ms
RandomBenchmark.Random                  1  thrpt    5   364587.781 ± 23904.474  ops/ms
RandomBenchmark.Random                  2  thrpt    5   324850.835 ±  1698.265  ops/ms
RandomBenchmark.Random                  3  thrpt    5   290855.010 ±  3524.691  ops/ms
RandomBenchmark.Random                  4  thrpt    5   286867.826 ±    58.331  ops/ms
RandomBenchmark.Random                  5  thrpt    5   151454.671 ±   525.393  ops/ms
RandomBenchmark.Random                  6  thrpt    5   147070.562 ±  1477.003  ops/ms
RandomBenchmark.Random                  7  thrpt    5   138053.754 ±   151.065  ops/ms
RandomBenchmark.Random                  8  thrpt    5   154585.711 ±  1495.177  ops/ms
RandomBenchmark.Random                 10  thrpt    5    92987.135 ±  1284.808  ops/ms
RandomBenchmark.Random                 12  thrpt    5   102440.798 ±   204.633  ops/ms
RandomBenchmark.Random                 14  thrpt    5    76235.547 ±    64.113  ops/ms
RandomBenchmark.Random                 16  thrpt    5    77672.178 ±    28.365  ops/ms
RandomBenchmark.Random                 32  thrpt    5    39193.225 ±    40.209  ops/ms
RandomBenchmark.Random                 64  thrpt    5    19684.798 ±     7.152  ops/ms
RandomBenchmark.Random                128  thrpt    5     9884.926 ±     1.765  ops/ms
RandomBenchmark.Random                256  thrpt    5     4862.050 ±     1.655  ops/ms
RandomBenchmark.Random                512  thrpt    5     2457.171 ±     1.042  ops/ms
RandomBenchmark.Random               1024  thrpt    5     1228.285 ±     0.736  ops/ms
RandomBenchmark.Random               2048  thrpt    5      615.795 ±     0.977  ops/ms
RandomBenchmark.Random               4096  thrpt    5      311.657 ±     0.124  ops/ms
RandomBenchmark.Random               8192  thrpt    5      152.179 ±     0.031  ops/ms

Use ByteArrayLittleEndian (#14636):

Results ``` Benchmark (length) Mode Cnt Score Error Units RandomBenchmark.L32X64MixRandom 0 thrpt 5 1528297.256 ± 11983.204 ops/ms RandomBenchmark.L32X64MixRandom 1 thrpt 5 215656.684 ± 1794.981 ops/ms RandomBenchmark.L32X64MixRandom 2 thrpt 5 201420.705 ± 1377.903 ops/ms RandomBenchmark.L32X64MixRandom 3 thrpt 5 190722.759 ± 3562.388 ops/ms RandomBenchmark.L32X64MixRandom 4 thrpt 5 184578.897 ± 587.992 ops/ms RandomBenchmark.L32X64MixRandom 5 thrpt 5 164248.972 ± 1153.358 ops/ms RandomBenchmark.L32X64MixRandom 6 thrpt 5 145869.045 ± 1342.215 ops/ms RandomBenchmark.L32X64MixRandom 7 thrpt 5 153291.149 ± 4666.694 ops/ms RandomBenchmark.L32X64MixRandom 8 thrpt 5 163664.923 ± 559.088 ops/ms RandomBenchmark.L32X64MixRandom 10 thrpt 5 101878.885 ± 322.857 ops/ms RandomBenchmark.L32X64MixRandom 12 thrpt 5 98918.245 ± 305.201 ops/ms RandomBenchmark.L32X64MixRandom 14 thrpt 5 95554.296 ± 253.037 ops/ms RandomBenchmark.L32X64MixRandom 16 thrpt 5 114686.083 ± 10.662 ops/ms RandomBenchmark.L32X64MixRandom 32 thrpt 5 54694.191 ± 77.666 ops/ms RandomBenchmark.L32X64MixRandom 64 thrpt 5 29272.233 ± 13.130 ops/ms RandomBenchmark.L32X64MixRandom 128 thrpt 5 15423.642 ± 13.856 ops/ms RandomBenchmark.L32X64MixRandom 256 thrpt 5 8007.269 ± 6.237 ops/ms RandomBenchmark.L32X64MixRandom 512 thrpt 5 4035.672 ± 1.192 ops/ms RandomBenchmark.L32X64MixRandom 1024 thrpt 5 2389.270 ± 1.732 ops/ms RandomBenchmark.L32X64MixRandom 2048 thrpt 5 1210.966 ± 0.645 ops/ms RandomBenchmark.L32X64MixRandom 4096 thrpt 5 609.226 ± 0.026 ops/ms RandomBenchmark.L32X64MixRandom 8192 thrpt 5 305.380 ± 0.147 ops/ms RandomBenchmark.Random 0 thrpt 5 1519068.332 ± 17554.468 ops/ms RandomBenchmark.Random 1 thrpt 5 349320.420 ± 50935.172 ops/ms RandomBenchmark.Random 2 thrpt 5 325239.890 ± 1852.854 ops/ms RandomBenchmark.Random 3 thrpt 5 293215.822 ± 5502.425 ops/ms RandomBenchmark.Random 4 thrpt 5 270030.002 ± 635.288 ops/ms RandomBenchmark.Random 5 thrpt 5 135824.338 ± 1411.090 ops/ms RandomBenchmark.Random 6 thrpt 5 131045.378 ± 131.826 ops/ms RandomBenchmark.Random 7 thrpt 5 123870.748 ± 281.168 ops/ms RandomBenchmark.Random 8 thrpt 5 159068.553 ± 577.367 ops/ms RandomBenchmark.Random 10 thrpt 5 97813.949 ± 133.771 ops/ms RandomBenchmark.Random 12 thrpt 5 104909.089 ± 54.468 ops/ms RandomBenchmark.Random 14 thrpt 5 75004.214 ± 237.386 ops/ms RandomBenchmark.Random 16 thrpt 5 78205.257 ± 91.166 ops/ms RandomBenchmark.Random 32 thrpt 5 39289.218 ± 24.475 ops/ms RandomBenchmark.Random 64 thrpt 5 19676.129 ± 8.671 ops/ms RandomBenchmark.Random 128 thrpt 5 9856.330 ± 1.669 ops/ms RandomBenchmark.Random 256 thrpt 5 4928.997 ± 1.652 ops/ms RandomBenchmark.Random 512 thrpt 5 2429.244 ± 2.227 ops/ms RandomBenchmark.Random 1024 thrpt 5 1239.338 ± 0.306 ops/ms RandomBenchmark.Random 2048 thrpt 5 619.758 ± 0.055 ops/ms RandomBenchmark.Random 4096 thrpt 5 274.033 ± 0.714 ops/ms RandomBenchmark.Random 8192 thrpt 5 151.607 ± 0.013 ops/ms ```

The result seems interesting.

@Glavo
Copy link
Contributor Author

Glavo commented Jun 25, 2023

The new implementation of ByteArrayLittleEndian in #14636 performs consistently with the old implementation using VarHandle. (This conclusion gives me more confidence in #14636)

Interestingly, Unsafe::putIntUnaligned/Unsafe::putLongUnaligned is not always faster than the new implementation of ByteArrayLittleEndian, even though it does not have additional bounds checking.

@liach
Copy link
Member

liach commented Jun 25, 2023

Can you publish your put_Unaligned code and the one with updated ByteArrayLittleEndian in two branches in your fork? I doubt something might be off in your code, and wish to test out on my end.

@Glavo
Copy link
Contributor Author

Glavo commented Jun 25, 2023

Can you publish your put_Unaligned code and the one with updated ByteArrayLittleEndian in two branches in your fork? I doubt something might be off in your code, and wish to test out on my end.

Use ByteArrayLittleEndian: https://github.com/Glavo/jdk/tree/random-byte-array

Use putXxxUnaligned: https://github.com/Glavo/jdk/tree/random-unaligned

This is my test server:

            .-/+oossssoo+/-.               glavo@minecraft-server
        `:+ssssssssssssssssss+:`           ----------------------
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 20.04.6 LTS x86_64
    .ossssssssssssssssssdMMMNysssso.       Kernel: 5.15.0-71-generic
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Uptime: 10 days, 2 hours, 42 mins
  +ssssssssshmydMMMMMMMNddddyssssssss+     Packages: 2165 (dpkg), 13 (snap)
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Shell: bash 5.0.17
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Terminal: /dev/pts/2
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   CPU: AMD Ryzen 7 5800X (16) @ 4.600GHz
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   GPU: NVIDIA GeForce GT 710
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Memory: 12570MiB / 32011MiB
+sssshhhyNMMNyssssssssssssyNMMMysssssss+
.ssssssssdMMMNhsssssssssshNMMMdssssssss.
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
  +sssssssssdmydMMMMMMMMddddyssssssss+
   /ssssssssssshdmNNNNmyNMMMMhssssss/
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.

I need to spend the day updating my server and upgrading some accessories tomorrow. If you are unable to replicate my previous JMH results, I will rerun all tests after upgrading the server.

@Glavo
Copy link
Contributor Author

Glavo commented Jun 25, 2023

@SirYwell

  1. I didn't find any proper tests that ensure that the behavior described in the Javadocs is actually maintained

I updated test/jdk/java/util/Random/NextBytes.java to also test RandomGenerator::nextBytes.

  1. I searched through usages of the nexBytes method on GitHub and mostly found a) usages of SecureRandom#nextBytes, which aren't affected by this, and b) usages with small arrays, where the effect isn't that huge.

I just did a quick search for nextBytes inside the JDK. In fact, there are many use cases for Random::nextBytes.

For example, in ZipEntryFreeTest, it is used to fill ten arrays with the length of 2,000,000:

private static void createZipFile() throws Exception {
Random rnd = new Random(1000L);
byte[] contents = new byte[2_000_000];
ZipEntry ze = null;
try (ZipOutputStream zos =
new ZipOutputStream(new FileOutputStream(ZIPFILE_NAME))) {
// uncompressed mode seemed to tickle the crash
zos.setMethod(ZipOutputStream.STORED);
for (int ze_count = 0; ze_count < 10; ze_count++) {
rnd.nextBytes(contents);

It is widely used in unit testing to generate random test data. Optimizing it can help developers reduce the time spent running tests.

@Glavo
Copy link
Contributor Author

Glavo commented Dec 25, 2023

I choose to use ByteArrayLittleEndian in RandomGenerator and continue using Unsafe in Random.

The latest test results have been updated in the first comment of this PR: #14638 (comment)

@Glavo
Copy link
Contributor Author

Glavo commented Dec 25, 2023

The benchmark results are somewhat weird: L32X64MixRandom has drastically different results even for small sizes that don't involve multi-byte writes.

@liach I understand, the reason is that I'm calling Unsafe.getUnsafe() instead of storing unsafe into a static final field. After storing it in a static final field, there is no performance difference from ByteArrayLittleEndian.

Since it is an interface method and there is no way to simply create a private field, I chose to use ByteArrayLittleEndian.

@SirYwell
Copy link
Member

I wonder how much the work from #16245 would cover here. If C2 can improve such situations, it might be a better solution.

@bokken
Copy link
Contributor

bokken commented Dec 30, 2023

Is there value in also implementing nextBytes explicitly in ThreadLocalRandom to be based on nextLong (rather than nextInt as inherited from Random)?

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 27, 2024

@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 24, 2024

@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Feb 24, 2024
@Glavo
Copy link
Contributor Author

Glavo commented Feb 25, 2024

/open

@openjdk openjdk bot reopened this Feb 25, 2024
@openjdk
Copy link

openjdk bot commented Feb 25, 2024

@Glavo This pull request is now open

@Glavo
Copy link
Contributor Author

Glavo commented Mar 6, 2024

Is there anyone to review this PR?

@liach
Copy link
Member

liach commented Mar 6, 2024

Will this be obsolete with #16245?

If not I can create a JBS issue so that this PR will get sent to the mailing list.

@openjdk
Copy link

openjdk bot commented Mar 13, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@bokken
Copy link
Contributor

bokken commented Mar 14, 2024

@liach as I read that pr, putting an int or long to a byte[] is still faster with Unsafe and ByteArrayLittleEndian (though much less difference than before).
It is also not clear to me that pr would identify the loop to be sequential stores. I /think/ the loop would be unrolled first and so it would be identified.

@bokken
Copy link
Contributor

bokken commented Mar 14, 2024

@Glavo were you going to update ThreadLocalRandom also (as mentioned here #14638 (comment))?

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 11, 2024

@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented May 10, 2024

@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this May 10, 2024
@Glavo
Copy link
Contributor Author

Glavo commented May 10, 2024

/open

@openjdk openjdk bot reopened this May 10, 2024
@openjdk
Copy link

openjdk bot commented May 10, 2024

@Glavo This pull request is now open

@openjdk
Copy link

openjdk bot commented Jun 6, 2024

@Glavo this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout random
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jun 6, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Jul 4, 2024

@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 1, 2024

@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org merge-conflict Pull request has merge conflict with target branch

Development

Successfully merging this pull request may close these issues.

4 participants