Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump tuple inference length cutoff from 16 to 32 #27398

Merged
merged 2 commits into from
Jun 27, 2018
Merged

Conversation

andyferris
Copy link
Member

@andyferris andyferris commented Jun 3, 2018

  • Bumps tupletype_len to 31 Removes tupletype_len entirely.
  • Bumps tuple_splat to 32
  • Similarly for the types Any16 -> Any32 and All16 -> All32.

For context, I feel this would be useful for working with heterogenously typed data as tuples and named tuples. In particular, for v0.7/v1.0, simple containers such as Vector{NamedTuple{...}} could be versatile, performant containers for tables and data (similarly for named tuples of arrays, etc), and at times the existing limits (where practically speaking its 14 elements being the biggest size that gives "full run-time speed") felt a bit limiting (e.g. a table with 15-30 columns doesn't seem particularly unreasonable, though for very large numbers I admit that switching to a more dynamic data structure might be preferable).

Incidentally, this might help with things like arrays with 15+ dimensions and so-on (#20163) (cc @Jutho). Having 30 dimensions as the maximum with fully-inferred code seems a somewhat reasonable cutoff number to me, giving ~1 billion elements for a 2x2x2x... sized array, as in #20163.

Of course, I'm more than a bit ignorant of what other impacts this may have internally for the compiler (compile time speed will obviously be slower in some situations) but I thought this might be worthwhile floating for inclusion in v0.7.

@andyferris
Copy link
Member Author

Appveyer timed out, Travis is complaining about Sockets, and the other CIs passed.

We could also run nanosoldier, except it doesn't really measure the negative impacts (compile time) and I'm guessing the benchmark suite is likely to be insensitive to any positive impacts...

@ararslan
Copy link
Member

ararslan commented Jun 3, 2018

Nanosoldier is broken right now anyway.

@vtjnash
Copy link
Member

vtjnash commented Jun 7, 2018

Similarly for the types Any16 -> Any32 and All16 -> All32.

This isn't necessarily similar – it's the cutoff for when we want to switch from the O(n^2) algorithm to the O(n) algorithm (although we have some compiler optimizations that – very slowly – can turn this particular O(n^2) algorithm back into the O(n) algorithm). If anything, it should probably be much smaller.

@andyferris
Copy link
Member Author

OK, I tend to agree with Jameson, such changes should be at the very least be considered separately (I had originally figured the less independent "magic" parameters, the better).

This PR now merely tweaks the default inference parameters and leaves Any16 alone.

@andyferris
Copy link
Member Author

Bump

@martinholters
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@vtjnash
Copy link
Member

vtjnash commented Jun 14, 2018

Fixes #22370

In that issue, I recommended removing limit_tuple_type, so this seems to be a cautious start in that direction.

@ChrisRackauckas
Copy link
Member

I'm not sure it's a full fix for #22370 which asks for a local override. While raising the global max helps, one could still run into a case where you know you have an array of 40 dimensions and just want a macro to locally change the max.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@andyferris
Copy link
Member Author

andyferris commented Jun 14, 2018

Is that a typical amount of noise for nanosoldier? It seems unlikely to me that this change should affect many of the benchmarks... (I assume we mostly benchmark already inferrable code)

@andyferris
Copy link
Member Author

In that issue, I recommended removing limit_tuple_type, so this seems to be a cautious start in that direction.

I'd also be happy if we simply removed the tuple limitations from inference, so long as this won't cause problems (but that shouldn't mean we can't merge this for now).

@ChrisRackauckas
Copy link
Member

I think @KristofferC mentioned to me some adverse timing effects in unrelated benchmarks due to high tuple type limitations in Slack. I didn't understand why it would occur though (IIRC it was runtime and not compile time?), so maybe he has some details.

@ChrisRackauckas
Copy link
Member

Bumping this. For reference, I received a PR which uses length 46 SVectors, and it's not a particularly bad idea in this case either.

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/pull/374/files/581bf0e28fac0f5490cfc5bdc484ad531546de2c..9833eb62f4c2b4b6eda6cca62182428611bd33d6#diff-6c62ad55534b8f93c40614616894d756R2

@andyferris
Copy link
Member Author

@ChrisRackauckas Yes, I believe (hope) we will be getting something like this in soon. After brief discussions with Jameson, my understanding is that we might try to eliminate the tupletype_len parameter altogether, which I planned to attempt on this branch when I get a moment.

Out of curiousity, did you experience performance problems working with SVector{46, Int}?

@ChrisRackauckas
Copy link
Member

Yes, I believe (hope) we will be getting something like this in soon. After brief discussions with Jameson, my understanding is that we might try to eliminate the tupletype_len parameter altogether, which I planned to attempt on this branch when I get a moment.

Thanks for the update!

Out of curiousity, did you experience performance problems working with SVector{46, Int}?

I didn't performance test that yet since right now we're just getting it working. It's an algorithm with 46 hardcoded constants though, so I hope we don't need an allocation to get around this issue.

@KristofferC
Copy link
Member

Isn't this a cache that is not supposed to be created over and over in e.g. a hot loop? Creating a length 46 array take on the order for 50ns so why are you worried about the allocation? Did you profile to see that using an Array slows things down? It might be faster?

@ChrisRackauckas
Copy link
Member

Isn't this a cache that is not supposed to be created over and over in e.g. a hot loop? Creating a length 46 array take on the order for 50ns so why are you worried about the allocation? Did you profile to see that using an Array slows things down? It might be faster?

It has nothing to do with that. Performance on the CPU with standard types etc. is probably fine with an array. But there are many reasons to want to write an algorithm with a loop-able list of more than 16 constants. In this particular application it has to do with attempts to compile to alternative targets like asm.js and .ptx kernels. Getting that stuff working is much easier when there are no arrays involved (and even if we can get it to work, in that case not having to create 4000 small GPU-based arrays to solve each small ODE in parallel is probably a good idea anyways)

@andyferris
Copy link
Member Author

@vtjnash I've made a first attempt at erasing tupletype_len from existence, in the second commit. Is this now something like you imagined?

If not, we can try sneak just the first commit in v0.7 ASAP.

@andyferris
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

ProcessExitedException()

cc @ararslan

@ararslan
Copy link
Member

Sorry, had to kill the job because the server needed to be restarted.

@nanosoldier runbenchmarks(ALL, vs=":master")

@andyferris
Copy link
Member Author

andyferris commented Jun 26, 2018

CircleCI seems unique in complaining that

From worker 3:	Internal error: encountered unexpected error in runtime:
      From worker 3:	ErrorException("type Params has no field MAX_TUPLETYPE_LEN")

Not sure how I can not be seeing this locally or in other CI? I've restarted one of them...

@vtjnash
Copy link
Member

vtjnash commented Jun 26, 2018

Yes, this looks like what I had in mind

@andyferris
Copy link
Member Author

andyferris commented Jun 27, 2018

OK - CircleCI was correctly picking up that @martinholters very recently merged some changes that depended on MAX_TUPLETYPE_LEN in #27434. @martinholters can you please check if the logic remains sound now I've removed that bound, or if we need to used the splat size parameter (MAX_TUPLE_SPLAT) to limit the loop here? Thanks. (EDIT: as I understand our plan to remove MAX_TUPLETYPE_LEN, we don't want to remove any limits on recursion, but we do want to treat tuples a bit more like structs where inference doesn't care specifically how many fields there are).

@andyferris andyferris added the compiler:inference Type inference label Jun 27, 2018
@andyferris andyferris added this to the 0.7 milestone Jun 27, 2018
@andyferris
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@KristofferC
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@martinholters
Copy link
Member

Yes, using MAX_TUPLE_SPLAT there sounds correct 👍

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@@ -388,7 +387,7 @@ function abstract_iteration(@nospecialize(itertype), vtypes::VarTable, sv::Infer
valtype = statetype = Bottom
ret = Any[]
stateordonet = widenconst(stateordonet)
while !(Nothing <: stateordonet) && length(ret) < sv.params.MAX_TUPLETYPE_LEN
while !(Nothing <: stateordonet) || length(ret) < sv.params.MAX_TUPLE_SPLAT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change from && to ||? Doesn't that lead to an infinite loop for something like

struct Foo end
Base.iterate(::Foo) = (0, nothing)
Base.iterate(::Foo, state) = (0, (state,))
foo(x) = tuple(x...)
@code_warntype foo(Foo())

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops!!! 😧 I'll fix that

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@JeffBezanson
Copy link
Member

This is not release-blocking.

@JeffBezanson JeffBezanson removed this from the 0.7 milestone Jun 27, 2018
@andyferris
Copy link
Member Author

Sorry Jeff

@andyferris
Copy link
Member Author

CI and Nansoldier seem good (Nanosoldier noise looks like last time?). Going to merge.

@andyferris andyferris merged commit fed0d5d into master Jun 27, 2018
@StefanKarpinski StefanKarpinski deleted the ajf/infer-ntuple-32 branch June 27, 2018 21:30
@ChrisRackauckas
Copy link
Member

Why does the splat limit need to stay?

@andyferris
Copy link
Member Author

I believe that it protects us from infinite recursion from badly written code, and the compiler never terminating.

I think this is because deciding whether a given piece of code is badly written is the same as the halting problem, which we can't solve. So we need some variety of simplistic, strong heuristic that guarantees compilation would not fail.

@martinholters
Copy link
Member

MAX_TUPLE_SPLAT is used for two things. One is when splatting non-tuples (so renaming it might be a good idea), the other is for elision/inlining of the _apply call for tuple arguments. The former needs a hard limit to ensure termination (see #27398 (comment)), the latter ... I don't know.

@GiggleLiu
Copy link
Contributor

GiggleLiu commented Dec 15, 2020

For tuple size > 16, the spatting still allocates. According to this PR, it should have been fixed. Is this an regression?
My Julia version is 1.5.2

FYI: It causes problem to CUDA array manipulations, see issue
JuliaGPU/GPUArrays.jl#340 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants