Skip to content

Parity with Hotspot compiler in a loop unrolling #1647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
plokhotnyuk opened this issue Aug 31, 2019 · 3 comments
Open

Parity with Hotspot compiler in a loop unrolling #1647

plokhotnyuk opened this issue Aug 31, 2019 · 3 comments
Assignees

Comments

@plokhotnyuk
Copy link

I have a couple of cases when Hotspot compiler do it better for JDK 8 than GraalVM 19.1.1.

Currently, I'm trying to mitigate it by unrolling some hottest cases manually:

  1. https://github.com/plokhotnyuk/jsoniter-scala/blob/db689583dc35cd3dd457e9edb0b7fbcc10a5d508/jsoniter-scala-core/src/main/scala/com/github/plokhotnyuk/jsoniter_scala/core/JsonReader.scala#L2365-L2504
  2. https://github.com/plokhotnyuk/jsoniter-scala/pull/364/files

It would be better to have a parity with Hotspot in automatic unrolling of loops to avoid code cluttering and maintenance burden.

@thomaswue
Copy link
Member

Yes, this should certainly be automatic. I assume that you mean here "full unroll" where a loop body is unrolled completely due to constant iteration bound as opposed to "partial unroll" where just multiple subsequent iterations of a loop are merged into one, but the total iteration bound of the loop is unknown?

How much of a performance difference are you experiencing for unrolled vs non-unrolled? Those policies are sometimes sensitive as the too much unrolling can negatively impact the instruction cache performance.

@thomaswue thomaswue self-assigned this Sep 2, 2019
@plokhotnyuk
Copy link
Author

plokhotnyuk commented Sep 2, 2019

I mean "partial unroll" that usually done by 4x duplication of the loop body with additional copy for mini-loop that handles the remaining part.

Below are results for GraalVM CE/EE 19.2.0 for the 1st case of ASCII string parsing with and without manual unrolling that is switched on/off by the isGraalVM flag.

GraalVM CE 19.2.0 with manual unrolling (isGraalVM = true)

sbt -java-home /usr/lib/jvm/graalvm-ce-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                 (size)   Mode  Cnt         Score        Error  Units
[info] StringOfAsciiCharsReading.jsoniterScala        1  thrpt    5  52607141.605 ±  86287.948  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala       10  thrpt    5  31081447.078 ± 481894.323  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala      100  thrpt    5   8335755.034 ±  33083.995  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala     1000  thrpt    5    934160.199 ±   3848.913  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala    10000  thrpt    5    100635.218 ±    422.381  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala   100000  thrpt    5      9824.494 ±     39.988  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala  1000000  thrpt    5       886.779 ±      5.214  ops/s

GraalVM EE 19.2.0 with manual unrolling (isGraalVM = true)

sbt -java-home /usr/lib/jvm/graalvm-ee-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                 (size)   Mode  Cnt         Score         Error  Units
[info] StringOfAsciiCharsReading.jsoniterScala        1  thrpt    5  64737968.694 ± 1052304.912  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala       10  thrpt    5  38560441.600 ± 1932236.984  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala      100  thrpt    5  10736905.622 ±  541268.399  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala     1000  thrpt    5   1262840.467 ±    2130.942  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala    10000  thrpt    5    118401.576 ±    4178.023  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala   100000  thrpt    5     11760.347 ±     524.339  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala  1000000  thrpt    5       979.913 ±      43.457  ops/s

GraalVM CE 19.2.0 without manual unrolling (isGraalVM = false)

sbt -java-home /usr/lib/jvm/graalvm-ce-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                 (size)   Mode  Cnt         Score         Error  Units
[info] StringOfAsciiCharsReading.jsoniterScala        1  thrpt    5  52810124.159 ± 2332877.475  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala       10  thrpt    5  28600277.399 ± 1483439.788  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala      100  thrpt    5   6812044.543 ±  374698.138  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala     1000  thrpt    5    930820.669 ±   37439.626  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala    10000  thrpt    5     96474.200 ±    4195.969  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala   100000  thrpt    5      6520.250 ±     279.794  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala  1000000  thrpt    5       652.525 ±       5.579  ops/s

GraalVM EE 19.2.0 without manual unrolling (isGraalVM = false)

sbt -java-home /usr/lib/jvm/graalvm-ee-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                 (size)   Mode  Cnt         Score         Error  Units
[info] StringOfAsciiCharsReading.jsoniterScala        1  thrpt    5  65911350.665 ± 2582704.919  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala       10  thrpt    5  38507732.430 ± 2084657.605  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala      100  thrpt    5   6859580.394 ±  352288.886  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala     1000  thrpt    5    768506.384 ±    1803.553  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala    10000  thrpt    5     75424.605 ±     339.976  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala   100000  thrpt    5      6637.629 ±     726.107  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala  1000000  thrpt    5       603.647 ±      84.124  ops/s

@thomaswue
Copy link
Member

OK, thank you so much for these interesting measurements. We will investigate asap. @tkrodriguez @gilles-duboscq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants