-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards typed lambdas #10269
Towards typed lambdas #10269
Conversation
Oh by the way it is highly buggy now since we incorrectly generate a C call when there are captured things. The fast way to do this is probably using some kind of trampoline (I believe llvm has support for this) so that we can ignore the difference between a closure and a simple function pointer at callsites. |
Add Arg and Ret type parameters to the Function type. Those are determined by annotation at the lambda site, e.g. (a::Arg -> (a+1)::Ret) :: Function{(Arg,),Ret}. Untyped function are ::Function{Tuple,Any}. Inference should type the declarations and call-sites correctly.
Fortunately I've already written down most of my thoughts on this topic here: https://github.com/JeffBezanson/phdthesis/blob/master/chap4.tex#L604 |
On inferred return types: the body of a function is an Expr with head |
Given the existence of things like From there, we will probably have a hierarchy of function types, using the I'll sketch 3 nominal function types that we probably want. (1) C functions
(2) Nominal arrows (similar to this PR):
(3) Closures
Then we transform this:
into this:
Of course, that is just one possible formulation. Here generic functions are considered inherently top-level, as closures are implemented on top of them not the other way around. I'm starting to feel this is the right approach, as we have already been doing quite well with mostly top-level generic functions. The problem is that inner functions are slow, and this formulation would solve that. The tradeoff is that this design sacrifices generic functions with methods from multiple scopes. For example the rather useful idiom of wrapping a method in a
which I would be just as happy to disallow. We probably want generic functions to be mutated only at well-defined points, for example for #265. In any case I think the crucial design decision is, what are the things inside generic functions that get dispatched to? Typed lambdas as in this PR are one candidate. Currently we use the same nearly-useless I'd like to get a simpler internal representation of functions out of this. Currently we have this overly elaborate chain |
Interesting. Let me see if I understand this correctly : |
Yes, you have that right. Since we already generate specialized signatures for functions internally, it's possible we could expose those as It would be ideal to be able to decide whether to inline a function argument. For example |
Maybe there is value in keeping several function types as the same julia type however. Having multiple types will lead to dynamic dispatch on "call" calls when we are not sure whether we have, e.g., a closure or not. |
I'm not sure. It might not make sense to try to optimize the case where we don't know what kind of function is being called. There will be user-defined types that define |
|
After reading the content in your thesis, @JeffBezanson, I'm a little confused about your plan for a higher-order function like |
I think it would be totally ok to have some explicit way to ask for specialization on function arguments – you generally know when you need it and when you don't. Of course, doing the specialization completely automatically would be much slicker, but being able to make |
I'm confused, @StefanKarpinski: don't you basically always need specialization if we're going to use |
Yes, but the implementation of |
@johnmyleswhite higher-order functions vary. You can also push the complexity into the data structure (the "storage strategies" approach), and have an array that changes its representation as values are stored to it. However I doubt this can be made as fast as the hand-crafted Of course one has the option of writing higher-order functions that only accept |
@JeffBezanson For example -
So in user code we would have
leading to
is identical to
higher order functions such as map would fall out as something like the following
where in-lining should be possible for all but 'unconstrained closures' - see below. General question, for closures (since they follow the mutable Scheme model ) how does the compiler evaluate the type of the return type if one can modify the type of the closed variable from other scopes? I assume there would have to be some book keeping around references to the closed variable, is this done today? |
Putting Arrow at the top of the function hierarchy forces everything to be
classified by argument and return type, but that isn't useful for all kinds
of functions. Of course there should be an AbstractArrow that you can
subtype if you want.
As you can see, if every function must be an Arrow, you are often forced to
give up and pick Any as the return type, leading `map` to return an Any
array.
|
To check, the argument you are making is that since you can (in many cases) compute the return type of the function being called it is better to leaving typing to the last instant? I'm not sure that I get the 'all kinds of functions' reference. Perhaps it is the parametric types which lead me to think this way? Once I have parametric types available I dislike having to throw away the type information for the return type of the function. It feels more natural to carry through the typing to the very end. Perhaps I am misguided in this? For a higher order function (e.g. map) having the types nailed would appear to allow me to avoid a largish class of performance issues? My mental model has every function classified by argument and return type, where in most cases the compiler is filling in the blanks. If I specify a function from Float64->Float64 to map I'd assume that the map expansion would be as I indicated above with no further type inference required. If I don't pass an array of Float64 to map I would expect it to fail. If I don't specify specific types for my function then there would be two outcomes, I get an array of Any returned (not ideal but not unreasonable) or the compiler is in a position to figure out the types of A and R and returns an array of type R. Where R is a function of supplied array and the function supplied. |
You're pretty much right, but you're only paying attention to the easiest case. With my implementation of The problem is that in general, (1) we care about generic functions most of all, and (2) you currently never need to specify the return type of a function manually. It actually is unreasonable for us to require a type declaration for For example,
If the compiler gets better, everything will be the same except this will get faster. That's what we want. |
Indeed, it's difficult and we often can't get sharp type information in those cases. However with lexical scope you can see all assignments to each variable, so if all those assignments assign the same type things should be ok. |
A couple other points: As a matter of syntax, we could decide to make You can also choose to write a method |
obsoleted by #13412 (and the disappearance of anonymous lambdas from the system) |
As promised in #1864, here is something somewhat working on top of current master.
There are ~5 parts to this :
Was easier than expected. I did not take care of the serializer so any A->B going through the system image should get out as (Any...,)->Any for now.
Also the function types are standard parametric DataTypes so there is no co/contra variance on ret/arg types.
Currently this is done in a dirty way. For the return type I'm just looking at the last instruction of the function to see if it is of the form return blah::X. This is not even correct.
To integrate this properly syntax should probably be discussed and the frontend modified to insert the necessary information in a proper field of the lambda ast node. (see jl_lam_(arg|ret)type in ast.c). We should also decide whether to let the user access inferred types here, or force explicit annotations.
Straightforward, see changes in inference.jl. I'm not doing anything when the Function type is not leaf so it could be smarter and look at TypeVar upper/lower bounds.
This part I'm not very confident I didn't break anything. In particular, are there cases where a specialized function requires boxing even if we can't tell looking at the julia signature ? Here we need to generate a C specialized call with this as only information whereas the current code relies on the known llvm signature. (Not sure if I'm clear but looking at the changes in codegen.cpp should explain my point better...).
Examples :
About generic functions, there are some big issues I can see, namely : mutability of the "generic signature", hard to avoid introducing a separate type for each function, hard to dispatch on.
As I said before, this is very early POC, probably broken in several ways, and I won't be able to spend much time on it right now. Might be a good basis for discussion however.