Skip to content

Range.foreach is not specialised #19759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
limansky opened this issue Feb 22, 2024 · 10 comments
Open

Range.foreach is not specialised #19759

limansky opened this issue Feb 22, 2024 · 10 comments

Comments

@limansky
Copy link

Compiler version

I've compared a simple peace of code performance running with Scala 2 and Scala 3 and faced with performance regression. In this case Scala 2 generated program is more than 7 times faster on my computer.
The code is just calculate sum of big Array[Long] several times.

Minimized code

    (0 until M).foreach { _ =>
      var i = 0
      while (i < N) {
        sum += arr(i)
        i += 1
      }
    }

The whole project code is available here: https://github.com/limansky/perf-issue

Output

sbt:perf> +run
[info] Setting Scala version to 2.13.12 on 1 projects.
[info] Reapplying settings...
[info] set current project to perf (in build file:/home/limansky/projects/langperf/scala/)
[info] running Main 
time 4268 ms
10000000000
[success] Total time: 5 s, completed Feb 22, 2024, 3:10:32 PM
[info] Setting Scala version to 3.3.1 on 1 projects.
[info] Reapplying settings...
[info] set current project to perf (in build file:/home/limansky/projects/langperf/scala/)
[info] running Main 
time 31699 ms
10000000000

Expectation

I suppose Scala 3 should be at least as fast as Scala 2.

@limansky limansky added itype:bug stat:needs triage Every issue needs to have an "area" and "itype" label labels Feb 22, 2024
@sjrd
Copy link
Member

sjrd commented Feb 22, 2024

Simple programs like that are not remotely good enough to reliably test performance on the JVM. Please use JMH when attempting to compare run-time performance on the JVM.

@sjrd sjrd closed this as not planned Won't fix, can't repro, duplicate, stale Feb 22, 2024
@bishabosha bishabosha reopened this Feb 22, 2024
@bishabosha
Copy link
Member

actually the answer is very clear, Scala 2.13 specialised foreach to foreach$mVc$sp, but Scala 3 does not

@lrytz
Copy link
Member

lrytz commented Feb 22, 2024

looking at cfr-decompiler output difference

         long s2 = System.nanoTime();
-        RichInt$.MODULE$.until$extension(Predef$.MODULE$.intWrapper(0), M).foreach$mVc$sp((Function1)(JFunction1.mcVI.sp & Serializable)x$1 -> {
+        RichInt$.MODULE$.until$extension(Predef$.MODULE$.intWrapper(0), M).foreach((Function1)(JFunction1.mcVI.sp & Serializable)_$1 -> {
             for (int i = 0; i < N; ++i) {
                 sum$1.elem += arr[i];
             }

Range.foreach is specialized for Int => Unit, but it seems Scala 3 doesn't invoke foreach$mVc$sp, so every int is boxed when calling the argument function.

@lrytz
Copy link
Member

lrytz commented Feb 22, 2024

(jinx..)

@bishabosha bishabosha changed the title Performance regression comparing to Scala 2.13 Performance regression comparing to Scala 2.13 (Range.foreach not specialised) Feb 22, 2024
@bishabosha bishabosha added itype:performance regression:scala2 stat:needs triage Every issue needs to have an "area" and "itype" label and removed stat:needs triage Every issue needs to have an "area" and "itype" label labels Feb 22, 2024
@limansky
Copy link
Author

Thanks, @bishabosha
Replacing foreach with while makes performance the same. This is kind of counter-intuitive for me, because foreach is used for the outer loop, which has small number of iterations. Could you please explain why boxing of the outer loop variable has such significant performance impact in this case?

@som-snytt
Copy link
Contributor

actually the clearer answer is that Scala 3 doesn't do DelayedInit anymore, so the example is compiled to an ordinary module which is not optimized i.e. jitted on the JVM because it is static initialization. (This is the old REPL bugaboo.)

Showing original in-module (the b version) vs factored out to instance method:

➜  snips scala i19759.scala
time 4148 ms
10000000000
➜  snips ~/projects/dotty/bin/scala i19759.scala
time 4118 ms
10000000000
➜  snips scala i19759b.scala
time 4184 ms
10000000000
➜  snips ~/projects/dotty/bin/scala i19759b.scala
time 70905 ms
10000000000

The original is the b version because when pasting I automatically moved it from App to regular class without thinking. That is what coding trauma does to the brain.

@lrytz
Copy link
Member

lrytz commented Feb 23, 2024

I must have mixed things up yesterday, i thought I could reproduce it with a main method. But either way

  • @sjrd is right that this is not a reliable benchmark
  • if the performance of using foreach is the same as with foreach$mVc$sp, that's thanks to the JVM inlining and eliminating the boxing. The int argument is unused here, so it's conceivable that the optimization might not always succeed in more complicated cases.

So it still seems worth fixing.

@bishabosha bishabosha changed the title Performance regression comparing to Scala 2.13 (Range.foreach not specialised) Performance regression comparing to Scala 2.13 Feb 23, 2024
@bishabosha bishabosha changed the title Performance regression comparing to Scala 2.13 Performance regression comparing to Scala 2.13 (Also Range.foreach is not specialised) Feb 23, 2024
@bishabosha bishabosha added area:transform stat:needs triage Every issue needs to have an "area" and "itype" label and removed itype:performance regression:scala2 stat:needs triage Every issue needs to have an "area" and "itype" label labels Feb 23, 2024
@bishabosha bishabosha changed the title Performance regression comparing to Scala 2.13 (Also Range.foreach is not specialised) Range.foreach is not specialised Feb 23, 2024
@bishabosha
Copy link
Member

bishabosha commented Feb 23, 2024

I removed the regression label because DelayedInit is explicitly a removed feature, and renamed to reflect that we should focus on fixing the specialisation

@OndrejSpanel
Copy link
Member

I have asked a question in https://contributors.scala-lang.org/t/status-of-specialization-in-scala-3/4628 and my impression from the response was there is no support at all for specialization in Scala 3.

I know Scala 3 certainly cannot produce specialized code. Is it able to consume specialized functions created in Scala 2, or how can this issue be fixed?

@odersky
Copy link
Contributor

odersky commented Feb 24, 2024

@OndrejSpanel Yes, it should be able to consume specialized functions created in Scala 2. So it seems there's an issue with that specialization that needs fixing.

@dwijnand dwijnand removed the stat:needs triage Every issue needs to have an "area" and "itype" label label Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants