Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add promote_strict mechanism and use it instead of typejoin() #25423

Closed
wants to merge 7 commits into from

Conversation

nalimilan
Copy link
Member

Introduce a mechanism similar to promote, but used to return types which can hold exactly all values of the input types. Use it instead of typejoin to choose an appropriate element type with collect, map, broadcast, and Dict.

Add appropriate methods for Number types, as well as for Nothing/Missing so that Union{T, Nothing/Missing} is used instead of Any. Add promotion rules for Tuple and NamedTuple so that promotion is performed element by element, to allow for more precise typing of fields, potentially helping the compiler down the road.


See summary in this Discourse post. This replaces #24332 for the goal of improving typing and performance when working with missing values. This is very preliminary (no docs in the manual, methods missing for some types, missing tests...), but I think this illustrates the approach and will probably expose many mistakes I'm making. Here's in particular what this achieves (on current master the type is NamedTuple{(:a, :b),T} where T<:Tuple):

julia> f(x) = (a = x.a, b = x.a + x.b)
f (generic function with 1 method)

julia> map(f, [(a=1, b=missing), (a=1, b=2)])
2-element Array{NamedTuple{(:a, :b),Tuple{Int64,Union{Missing, Int64}}},1}:
NamedTuple{(:a, :b),Tuple{Int64,Union{Missing, Int64}}}((1, missing))
NamedTuple{(:a, :b),Tuple{Int64,Union{Missing, Int64}}}((1, 3))

I don't particularly like the promote_strict term; we could also reuse widen, though keeping promote in the name makes the relationship with promote clearer (which makes sense in particular because of the fallback from promote to promote_strict).

@nalimilan nalimilan added missing data Base.missing and related functionality types and dispatch Types, subtyping and method dispatch labels Jan 5, 2018
…ppropriate

Introduce a mechanism similar to promote(), but used to return types which can hold
exactly all values of the input types. Use it instead of typejoin() to choose an
appropriate element type with collect(), map(), broadcast(), and Dict().

Add appropriate methods for Number types, as well as for Nothing/Missing so that
Union{T, Nothing/Missing} is used instead of Any. Add promotion rules for Tuple and
NamedTuple so that promotion is performed element by element, to allow for more
precise typing of fields, potentially helping the compiler down the road.
@alyst
Copy link
Contributor

alyst commented Jan 6, 2018

I'm not a specialist in promotion rules, just a thought: what if it's implemented as promote_type(::Type{<:PromotionPolicy}, T1, T2) etc, with GenericPromotion and StrictPromotion (Exact/Widening/Lossless?) subtypes of PromotionPolicy?
The idea is to keep the single promotion mechanism, which would make sense if most of the rules are not policy-specific.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jan 6, 2018

I had a similar thought, @alyst. I guess I would call it PromotionStyle or something to match our IterationStyle and BroadcastStyle types and whatnot.

@nalimilan
Copy link
Member Author

Good idea. That also reduces code duplication. I've retained PromotionStyle with DefaultPromotion and ExactPromotion, but feel free to propose better names.

I've made a lot of changes, the PR should be ready for a serious review now.

At first I was tempted to apply automatically to all types the new logic which transforms unions of parametric types into types with Union parameters, but I realized it wasn't correct as types do not necessarily implement the corresponding conversion methods (which may not make sense at all).

So now this behavior is opt-in, via promote_rule methods which compute the values of each type parameter via recursive calls to promote_type (i.e. as it already worked). It is used for Tuple (similar to what typejoin did, actually that's almost the same code) and for NamedTuple. I haven't extended it to other types yet, but it would make sense to adapt all existing promote_rule methods to also support the new ExactPromotion. When I say "opt-in", I mean that parametric types which do not implement it will be combined using typejoin when they are used as eltype in map/generate and similar functions, so basically the current behavior will be preserved.

Do this only for NamedTuple (and Tuple since typejoin() already handles it). Else
types may not implement necessary conversions, which triggers errors.
@JeffBezanson
Copy link
Member

I don't think we want a separate set of numeric promotion rules. Judging from past experience, people will not like it if combining e.g. Int8 and UInt8 gives different types in different contexts.

I think this should only be for containers; basically it should only change type parameters and not data representations. In fact NamedTuple is the only use case for this I'm aware of so far. But we might want to expand it to containers generally, and use existing promotions only for arithmetic. For instance we should maybe stop doing this:

julia> [[1], [1.0]]
2-element Array{Array{Float64,1},1}:
 [1.0]
 [1.0]

given that with different types this happens:

julia> [[1], [""]]
2-element Array{Array{T,1} where T,1}:
 [1] 
 [""]

@JeffBezanson
Copy link
Member

Actually, it's possible we really don't need arithmetic-style promotion rules for arrays anymore, since broadcast seems to automatically handle it (through a process that bottoms out at scalar arithmetic).

@StefanKarpinski
Copy link
Member

The range hashing code added recently needed exactly such a strict numeric promotion mechanism. We settled for calling widen on the arguments and computing steps in that type, but what was actually needed was a promotion mechanism that was guaranteed to be able to represent all its inputs exactly, which is what this defines.

@JeffBezanson
Copy link
Member

I don't think that's true; that code needed a non-overflowing subtraction (widen(x2) - widen(x1)) so a type that can represent both arguments is not sufficient.

In this context I want an even stricter promotion mechanism that preserves type information as well as values. And we don't want to go back to the world of issues like #3274, #14252, and all the unsigned stuff.

@nalimilan
Copy link
Member Author

I don't think we want a separate set of numeric promotion rules. Judging from past experience, people will not like it if combining e.g. Int8 and UInt8 gives different types in different contexts.

I think this should only be for containers; basically it should only change type parameters and not data representations. In fact NamedTuple is the only use case for this I'm aware of so far.

@JeffBezanson I don't see what you mean. We only use this in the context of containers (via map, collect, etc.), but it has to be defined for scalars which are entries of these containers.

Do you suggest we use exact promotion everywhere, i.e. that we (more or less) mandate that promotion methods should not lose precision? This could be OK for array concatenation, but how would scalar arithmetic choose the return type of operations?

The range hashing code indeed needs something else, though the new ExactPromotion system can be part of the solution: when the input types are different, you need to choose a common type which can represent both inputs exactly before finding a wider type.

@JeffBezanson
Copy link
Member

We only use this in the context of containers (via map, collect, etc.), but it has to be defined for scalars which are entries of these containers.

I'm saying it should only differ from typejoin on container types. I don't want map(identity, Any[0x1, Int8(2)]) to return an Int16 array.

Do you suggest we use exact promotion everywhere

No, I think scalar arithmetic should stay as-is. But we might want to remove the promote_rule definitions for array. With that, [1, 1.0] could still give a Float64 array, but [[1], [1.0]] would give e.g. Vector{Vector}.

@nalimilan
Copy link
Member Author

I'm saying it should only differ from typejoin on container types. I don't want map(identity, Any[0x1, Int8(2)]) to return an Int16 array.

Sorry, I still don't get it. Isn't the whole point that map and collect use that mechanism to choose the narrowest element (i.e. often scalar) type of the returned array (and therefore in the example choose Int16?). If we don't want that, we have to special case Nothing and Missing (what #24332 does), not special-case containers AFAICT.

No, I think scalar arithmetic should stay as-is. But we might want to remove the promote_rule definitions for array. With that, [1, 1.0] could still give a Float64 array, but [[1], [1.0]] would give e.g. Vector{Vector}.

Fine with me. I don't have use cases in mind for either choice.

@JeffBezanson
Copy link
Member

If we don't want that, we have to special case Nothing and Missing (what #24332 does), not special-case containers AFAICT.

Right, we would need definitions for Missing, Nothing, and NamedTuple.

@nalimilan
Copy link
Member Author

So, concretely, you suggest removing all methods on Number types? Then it wouldn't really be a promotion mechanism I'd say. So go back to making this an internal function called promote_typejoin or something like that?

@nalimilan
Copy link
Member Author

See #25553 for a more limited alternative which doesn't add any new public API, and uses the internal promote_join function instead.

@nalimilan
Copy link
Member Author

nalimilan commented Jan 19, 2018

Bump!

EDIT: I wanted to bump #25553 instead, but that's really the same issue.

@nalimilan
Copy link
Member Author

Closing since #25553 has been merged.

@nalimilan nalimilan closed this Jan 28, 2018
@nalimilan nalimilan deleted the nl/promote_strict branch January 28, 2018 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants