Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler performance regression with long tuples #28890

Closed
ExpandingMan opened this issue Aug 25, 2018 · 3 comments
Closed

compiler performance regression with long tuples #28890

ExpandingMan opened this issue Aug 25, 2018 · 3 comments
Labels
compiler:latency Compiler latency regression Regression in behavior compared to a previous version

Comments

@ExpandingMan
Copy link
Contributor

There seem to be some performance regressions in compile time of NTuple in 1.0. Note, for example,
1.0:

using StaticArrays
julia> @time @SMatrix rand(16,16)
 11.440824 seconds (1.41 M allocations: 85.987 MiB, 0.24% gc time)

0.6.4:

using StaticArrays
julia> @time @SMatrix rand(16,16);
  1.330479 seconds (356.50 k allocations: 13.710 MiB)

We were speculating that this is due to #27398. While it doesn't seem unreasonable that the compiler really has to think about 256 type long signatures, we believe that the compiler may not know that all the types are guaranteed to be the same since (as far as I know) NTuple is nothing more than an alias.

Am I getting the story here correct? Would it be possible to make the compiler know that NTuple is promised to be homogeneous, but otherwise retain the current behavior?

I apologize if this is somehow duplicating or echoing existing issues, I wasn't able to find anything that addressed this directly.

(Thanks to @ChrisRackauckas for useful discussion on this.)

@ChrisRackauckas
Copy link
Member

StaticArrays is one case where this shows up, and #27488 (comment) shows that it comes up in AD implementations as well. I think I talked with @Keno before about this case and he mentioned that in theory the compiler could have a better representation for it?

@JeffBezanson
Copy link
Member

NTuple is not always homogeneous, for example NTuple{3,Any}.

I would classify this as a compiler performance regression, and without more investigation we don't know whether NTuple has anything to do with it.

@nalimilan nalimilan added performance Must go faster regression Regression in behavior compared to a previous version labels Sep 3, 2018
@JeffBezanson JeffBezanson changed the title compiler should recognize NTuple types as homogeneous compiler performance regression with long tuples Sep 5, 2018
@JeffBezanson JeffBezanson added the compiler:latency Compiler latency label Sep 5, 2018
@KristofferC KristofferC removed the performance Must go faster label Oct 24, 2018
@vtjnash
Copy link
Member

vtjnash commented Oct 23, 2020

This has steadily gotten slightly better:

  | | |_| | | | (_| |  |  Version 1.0.5-pre.1 (2019-06-03)
 29.690679 seconds (3.78 M allocations: 261.054 MiB, 0.75% gc time)
  | | |_| | | | (_| |  |  Version 1.2.0-rc2.0 (2019-07-08)
  5.528890 seconds (3.09 M allocations: 111.745 MiB, 1.20% gc time)
  | | |_| | | | (_| |  |  Version 1.3.2-pre.0 (2019-12-31)
  5.571917 seconds (2.55 M allocations: 93.918 MiB, 1.43% gc time)
  | | |_| | | | (_| |  |  Version 1.4.3-pre.0 (2020-05-25)
  0.001262 seconds (21 allocations: 20.203 KiB)
  | | |_| | | | (_| |  |  Version 1.5.2 (2020-09-23)
  0.000564 seconds (19 allocations: 20.188 KiB)
  | | |_| | | | (_| |  |  Version 1.6.0-DEV.1316 (2020-10-22)
  0.000323 seconds (23 allocations: 20.297 KiB)

More helpfully though:

julia> @time @eval @SMatrix rand(16,16);
  | | |_| | | | (_| |  |  Version 1.0.5-pre.1 (2019-06-03)
 30.167509 seconds (3.78 M allocations: 261.239 MiB, 1.22% gc time)
  | | |_| | | | (_| |  |  Version 1.2.0-rc2.0 (2019-07-08)
  5.836282 seconds (3.09 M allocations: 111.879 MiB, 1.21% gc time)
  | | |_| | | | (_| |  |  Version 1.3.2-pre.0 (2019-12-31)
  5.845669 seconds (2.55 M allocations: 94.053 MiB, 1.37% gc time)
  | | |_| | | | (_| |  |  Version 1.4.3-pre.0 (2020-05-25)
  2.115126 seconds (2.78 M allocations: 117.953 MiB, 5.30% gc time)
  | | |_| | | | (_| |  |  Version 1.5.2 (2020-09-23)
  1.559476 seconds (2.30 M allocations: 87.064 MiB, 2.27% gc time)
  | | |_| | | | (_| |  |  Version 1.6.0-DEV.1316 (2020-10-22)
  2.177394 seconds (2.44 M allocations: 125.257 MiB, 1.54% gc time, 99.95% compilation time)

So we're back to v0.6- timings.

@vtjnash vtjnash closed this as completed Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

6 participants