Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Can't make a generated type in v0.5, but I can in v0.4 #16806

Closed
andyferris opened this issue Jun 7, 2016 · 19 comments
Closed

Regression: Can't make a generated type in v0.5, but I can in v0.4 #16806

andyferris opened this issue Jun 7, 2016 · 19 comments
Labels
regression Regression in behavior compared to a previous version won't change Indicates that work won't continue on an issue or pull request

Comments

@andyferris
Copy link
Member

Hi,

I've been playing with GeneratedTypes.jl and there seems to be a strange difference in behaviour in Julia 0.4 to 0.5.

The trick to create a generated type is to use a generated function to eval a new concrete type in its (abstract) constuctor. You can see the details at the package, but I'm shocked how nicely this works with no specific language support. It's kinda cool, and I'd like to use it for TypedTables.jl. It's potentially also great for static vectors and many other applications (talked about in #8472).

However, this restriction about eval() in @generated functions in 0.5 seems to kill the whole idea. 😢 Is this a temporary thing, or a permanent result of the new function-functor definition, or what?

If it is permanent, can anyone think of a way of performing a similar trick in 0.5? Will a @pure function allow eval (and be done only once)? Is there any thoughts on language support? I'd like to help advance the ecosystem, and any help would be appreciated.

@andyferris
Copy link
Member Author

Potentially related issue to generated types: #15791

@vtjnash
Copy link
Member

vtjnash commented Jun 7, 2016

You shouldn't be trying to build types out of data – Julia isn't really designed to handle that. It's undefined how your program will behavior if you call eval from an @generated or @pure function. It's generally a straightforward way to corrupt the runtime system state, so in v0.5 it went from being a very dangerous operation that sort-of manages to pass simple tests to safely throwing an error always.

@vtjnash vtjnash added the won't change Indicates that work won't continue on an issue or pull request label Jun 7, 2016
@andyferris
Copy link
Member Author

andyferris commented Jun 7, 2016

Thank you Jameson for your reply. I certainly wouldn't want to do anything that is unstable with respect to Julia's underlying representation, but on the other hand I would also like to continue investigating whether generated types are feasible in Julia. I see that you are the author of #16040 - can I bother you with some further questions? (sorry for such a lengthy post!!!)

I had assumed that type definitions in generated functions were a safer place to do so (compared to non-generated functions), since (as I understand it) the code-gen part of the generated function is executed all-at-once. In my package, these functions generate types which are only called by their constructor once the code-gen is complete. It seems side-effect-free to call the generator twice or more (the evals do nothing, but I could also skip them if the definitions are pre-existing). I can't see a way that the system (e.g. the work done in inference.jl) attempts to use a yet-to-be- or half-constructed type, as might occur as inference runs over a standard generic function with an arbitrary eval inside them with a later call to a constructor of the newly evaled type. Does that make sense? Compare these:

function f(x)
      expr = quote; type A; x::Int; end; end
      eval(Main, expr)
      A(x)
end

@generated function g(x)
    expr = quote; type B; x::Int; end; end
    eval(Main, expr)
    :(B(x))
end

I'm probably missing something, but the second seems to make more sense with respect to the way inference.jl jumps around, etc, but only the first is allowed in 0.5?

You shouldn't be trying to build types out of data

This is probably a silly response, but I'm being very careful to build types out of types and type parameters, not data. I can safely create new parametric types out of types and type parameters, which effectively describes a new type. (e.g. @generated addval{N,M}(::Type{Val{N}}, ::Type{Val{M}}) = :(Val{$(N+M})) effectively creates a new, never-instantiated type Val{N+M} and is perfectly safe and inferable).

But perhaps it's some "type-cache" that is never very stable to being pushed to outside of global scope? Or type definition is not thread-safe, and you're trying to implement a multithreaded compiler? Or maybe I'm completely missing the point? I didn't really see where such a "type-cache" is in inference.jl - types seem to be just be objects that exist in global scope of modules.

In any case, could you provide a fuller explanation of how my package might crash on 0.4? I'm pretty keen on investigating what might be possible in terms of implementing #8472 (generated types). Do you feel that much deeper language support is necessary? Do you think it is feasible at all?

@andyferris andyferris changed the title Regression: eval from generated Regression: Can't make a generated type in v0.5, but I can in v0.4 Jun 9, 2016
@andyferris
Copy link
Member Author

I'm going to continue piling on this thread since I'm still trying to determine if creating a type in a pure context will truly destroy the runtime state of Julia.

I've been digging down and down and I still can't see any obvious problem. AFAICT, within inference.jl, it should work out OK since the @generated code runs before the inference even sees the symbol for the new type.

Within the C-code, it seemed there are a few implementation-specific things to be careful of, but AFAICT there are no problems there. TypeName.uid seems to be incremented always by jl_atomic_fetch_add in jl_assign_type_uid, so it should be thread-safe and not care whether the type definition is called from a pure context or not. TypeName.cache and TypeName.linearcache seem to be internal implementation details for determining if paramaterized types have already been constructed. Since my constructed types are non-parameteric, this is a non-issue.

I admit I could be missing other pitfalls, but I'm beginning to think generated types could be a real thing.

(@vtjnash I'm not saying we should necessarily remove the restriction on eval within @generated, just looking for a solution to implementing generated types that is stable)

@JeffBezanson JeffBezanson added the regression Regression in behavior compared to a previous version label Jun 18, 2016
@vtjnash
Copy link
Member

vtjnash commented Aug 10, 2016

Why is it a regression to add an error for invalid code?

@andyferris
Copy link
Member Author

@vtjnash From my recollection of the discussions at JuliaCon (which admittedly is probably biased!!) it was conceded that GeneratedTypes.jl probably works the way it's meant to on v0.4, and probably would on 0.5 if there wasn't an error thrown by eval...

Is it really true that every possible eval() in a pure context is invalid code (except of course by de-facto definition of the language standard, which is just it's implementation, including the code which throws the error)? If so, then it isn't a regression - it's an improvement, and we should definitely close this issue. But to me it wasn't clear.

Of course, I would much prefer not having to resort to "hacks" like in GeneratedTypes at all. It is obvious that there are great features in the Julia pipeline, like anonymous structs. The problem is, I'm greedy now :) And its great the Julia is so hackable - it lets you push the boundaries to explore what might be useful/good design in the future, even if I wouldn't suggest using GeneratedTypes in production code to anyone...

@davidanthoff
Copy link
Contributor

Bump. I'm running into the same problem for some stuff that I'm trying to do in NamedTuples that I need for Query.

Can we resolve what the status here is? @vtjnash seems to think this can't be changed, but @JeffBezanson applied the "regression" label later, which makes me hopeful that maybe there is a way to enable this again, like it worked in julia 0.4? Also, am I reading @andyferris comment correctly, that the core team might have indicated at juliacon that this could be enabled again?

It would really enable some fantastic things for Query...

@vtjnash
Copy link
Member

vtjnash commented Sep 13, 2016

Is it really true that every possible eval() in a pure context is invalid code

Currently, (except for rare circumstances), we run generated functions exactly once, so it usually happens to work. But in the future, I think we will probably not run them at all (when we can avoid it). And thus, yes, there is no possible operation, for which eval is needed, that can be memoized over zero executions.

@andyferris
Copy link
Member Author

Currently, (except for rare circumstances), we run generated functions exactly once, so it usually happens to work. But in the future...

Okay, I understand. I guess I was hoping (and perhaps even pushing for) that undefined behaviour that exist now (like how many times a generator is executed) could become defined behaviour (e.g. it always runs exactly once) so that end-users could take advantage of it. Maybe we could also relax the requirement that they are pure to that the return value is independent of global state. There are several examples in the past where this proved to be a very powerful (if somewhat hacky) programming technique in Julia. But I'm getting the feeling that this would get in the way of some other changes you guys might be planning. (Perhaps multithreading, addressing #265, and some other things?). Which is fair enough for sure, but it still makes me (and some others) sad in the meantime where the goal of these hacky tricks becomes impossible and there is no equally powerful replacement.

I think we will probably not run them at all (when we can avoid it).

Okay, I'll bite... how do you not run the code? This sounds wonderfully magical :)

@davidanthoff
Copy link
Contributor

I think both @andyferris and I really don't need the ability to eval arbitrary code in a generated function, all we really want is the ability to generate a new type in a generated function (correct me if I'm wrong for your case, @andyferris). Is there maybe some way to just enable that, rather than arbitrary eval calls?

@andyferris
Copy link
Member Author

Right, that's true!

(Though it is also fun to be able to mutate and maybe even create globals (as a hack for static, mutatable variables), that is completely off-topic.)

Of course if the way function generators are executed becomes less reliable, then even this limited set of functionality might not be sufficient. It also might not imply that it's easy to implement, or compatible with all the other changes Jameson and the other devs are planning for the core architecture.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Sep 13, 2016

I really feel like generated types may just be a case of needing a stronger form of parameterization, like parametric fixed-size, inline arrays. I have a hard time imagining any other case where a type needs to be generated. It would make more sense to just add support for that in the language.

@JeffBezanson
Copy link
Member

JeffBezanson commented Sep 13, 2016

Yes, I think most of the use cases would be handled by built-in fixed size arrays (or very efficient NTuples), something like NamedTuple, and computed field types (#18466).

@eschnett
Copy link
Contributor

In FastArrays.jl, I don't know ahead of time which fields should exist. This is a rather complex function of the type parameters. Thus fixed-sized inline arrays or computed field types would not suffice.

Of course, one can take things to the extreme, and use computed field types to either return Int or Void, which allows "switching off" unwanted elements. If you then have a tuple of such types, you can generate everything. If fact, with named tuples, basically every type can be represented as named tuple -- all that is needed is re-writing the accessors.

To make things work with Julia 0.5, I currently pass one additional type parameter to the type. This is usually a tuple type (calculated by a pure function). There is one field in the type, of that tuple type: This allows me to represent everything I want. I then use generated functions for the accessors to extract the respective tuple elements.

Given that this currently is possible, and will likely become more user friendly, I wonder: What step exactly makes it impossible to lay out a new type in a pure setting, if I can access arbitrary tuple types that have exactly the same functionality? The only difference between tuples and other immutables that I currently see is that the latter support a "dot" syntax to access elements. Or is it inserting the type into the type system?

If so, and hypothetically speaking: If I could make certain promises about the generated types, such as e.g. that they are all subtype of one existing abstract type, and that their type parameters can otherwise not be used for dispatch (or a similar condition), would that make it possible to generate types?

@davidanthoff
Copy link
Contributor

I don't think my case would be covered by a stronger parameterization story (if I understand that idea correctly). My (current) use case would be this: I want to write a function that takes two immutables as parameters and returns a new immutable with fields that are the union of the fields of the two parameters, and that in a type stable way. Essentially a type stable version of the merge function in NamedTuples.

That is the simplest need I have right now for Query. I know about more complicated scenarios as well that I probably want to enable down the road, but they all have the same structure that I want to write functions that return types where the fields (their number, their names and types) depend in some way on the input types of the function.

@andyferris
Copy link
Member Author

Yes, I think most of the use cases would be handled by built-in fixed size arrays (or very efficient NTuples), something like NamedTuple, and computed field types (#18466).

+100 for these features. They will certainly be extremely handy.

However, "something like NamedTuple" is a statement I'm a bit worried about. At the moment, Tuple is special in that it is the only type with a variable number of fields. I can't do that with any other type. There isn't even a mutable tuple (which, if inlined/stack allocated, might be a perfect "built-in fixed
size array"). Adding a NamedTuple will be very useful, but I still won't be able to do that with any other type in some other part of the type tree (as in, I might want Table <: AbstractTable not NamedTuple <: Any). An obvious workaround to that is to allow overloading on getfield, but that has other side-effects.

So, to me it is the right combination of features that allow simultaneously for

  • Arbitrary number of fields, in mutables and immutables
  • Arbitrary naming of fields or more flexible use of getfield.
  • More flexible field type calculations
  • Arbitrary position in the type tree
  • and, as a stretch goal, something (anything!) I can mutate and yet be fast (inlined / on the stack).

that I would like to aim for (the last being somewhat tangential to this issue). We've seen some great packages lately out there pushing the boundaries that use/want these things and want them in combination.

@davidanthoff
Copy link
Contributor

Bump, is there any chance that some progress could be made on this? This would help sooo much with Query.jl...

@vtjnash
Copy link
Member

vtjnash commented Dec 10, 2016

No. That's why this has the "won't fix" label.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 12, 2016

Side-effecting inside of generated function definitely cannot be supported and generating an entire type inside of a generated function is one hell of a side effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Regression in behavior compared to a previous version won't change Indicates that work won't continue on an issue or pull request
Projects
None yet
Development

No branches or pull requests

6 participants