Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Map for Nullable #9446

Closed
wants to merge 1 commit into from
Closed

Conversation

vchuravy
Copy link
Member

For me Nullable is very similar to the Option type used in functional languages. Most of the functional languages I encountered treat Option as a container with one array and define operations like map over it.

One possible way would be to define the iterator interface for Nullable thus allowing for value in Nullable(value) and all other method that depend on it. I personally see little to no value in that and thus only implemented the one operation that I find most useful: map

map over Nullables allows for using Nullable unaware function with Nullable values and codifies a common pattern.

This PR only implements map over a singular Nullable and for me the question is open whether a map for multiple Nullable has a well defined meaning and is useful.

@nalimilan
Copy link
Member

In what context did you feel the need for map over Nullable?

I'd say Nullable should implement everything common types implement, not more, not less. But the current situation is not very clear, as you can iterate over a number, but not over e.g. a Date or a Symbol.

@@ -56,3 +56,9 @@ function hash(x::Nullable, h::UInt)
return hash(x.value, h + nullablehash_seed)
end
end

# Specialised map over Nullable
function map(f::Callable, x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this x::Nullable{T}

@johnmyleswhite
Copy link
Member

I think this captures a useful pattern in which you might want to apply f if there's a value or else propagate out null. It is one of those things that's really awkward without explicit return types.


# test map
@test map(x->x, Nullable()) == Nullable()
@test map(x->x) Nullable(1.0)) == Nullable(1.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, Nullable() does not construct a Nullable, also, parens are not balanced in the last line.

@vchuravy
Copy link
Member Author

Thanks for the feedback, one does not simply submit a PR before heading straight to bed.

As a return_type I decided on Nullable{Union()}() since that should be more applicable then Nullable{T}().

@nalimilan
Copy link
Member

(My secret wish is that f(x::Nullable) would give the same result as map(x->x, f) with this PR.)

@vchuravy
Copy link
Member Author

@nalimilan But the return type will always be depending on the return type of f. The idea of using Nullable{Union()}() is based on a comment from Jeff #9364 (comment)

@vchuravy
Copy link
Member Author

I updated the PR with the helpful comments from @nalimilan,

The only thing that is weird is that sometimes the value of a Nullable{Int64}() differs and that causes === to behave unexpected. See the code below for an example.

Possible related to that is #9147

julia> Nullable{Int64}() ===  Nullable{Int64}()
true

julia> map(x -> x, a)
Nullable{Int64}()

julia> map(x -> x,  Nullable{Int64}()) === Nullable{Int64}()
false

julia> isequal(map(x -> x,  Nullable{Int64}()), Nullable{Int64}())
true

julia> map(x -> x,  Nullable{Union()}()) === Nullable{Union()}()
true

@nalimilan
Copy link
Member

@nalimilan But the return type will always be depending on the return type of f. The idea of using Nullable{Union()}() is based on a comment from Jeff #9364 (comment)

I think the situation is different here, because there are no return type declarations, and thus no conversion of Nullable{Union()} to Nullable{Int}.

@johnmyleswhite
Copy link
Member

Yeah, you're right about @JeffBezanson's point: Union() won't propagate deeper into the type system. I'm still getting used to that lesson, but it's relevant here. In the absence of return types, what you wrote is the best way to write this.

@eschnett
Copy link
Contributor

Instead of implementing map directly, wouldn't it be better to treat it as container and provide the usual iterator interface? This would also give fold and friends.

@nalimilan You need this when you write code that acts on containers, and that is independent of the type of container used. In this case, it is convenient to have a common API for all container-like types. Nullable fits that description. RemoteRef almost does, and I think it should as well.

@vchuravy
Copy link
Member Author

So I changed things back to Union()

@eschnett I was wondering that too, but I always felt that any container function besides map and flatMap are less useful for Option-like types.

@nalimilan
Copy link
Member

@nalimilan You need this when you write code that acts on containers, and that is independent of the type of container used. In this case, it is convenient to have a common API for all container-like types. Nullable fits that description. RemoteRef almost does, and I think it should as well.

That's why I don't really like this approach: I don't see Nullable as a container, rather as a value which can be missing. This comes from my background in statistical languages like R, SAS, Stata, etc. I don't see how considering Nullable as a container is really useful, since it can only contain one value. You never need to iterate over it. The argument about functions expecting containers does not hold more than that about functions expecting a scalar IMHO: I'd also like the latter to be supported transparently, but currently that's not how people seem to envisage Nullable.

@lindahua
Copy link
Contributor

This implementation is not type stable. Also, I agree with @nalimilan that it is strange to consider Nullable as a container to which one can apply things like map.

@eschnett
Copy link
Contributor

It may seem strange at first to consider Nullable as a container, but there's nothing wrong with it. It introduces no ambiguity, different from treating Nullable as a scalar.

@johnmyleswhite
Copy link
Member

Let's keep separate the issues of whether Nullable is a container and whether it's iterable.

With that proviso, Nullable is a container in almost exactly the way that a boxed value is stored in a "container". The Nullable wrapper adds indirection, which allows us to provide extra functionality. Nullable{Int} provides us with some of the power of the Integer/int distinction in Java, which Java achieves by boxing int values, but doesn't provide the unfortunate identity semantics that Integer brings along with it.

@vchuravy
Copy link
Member Author

@lindahua wrt to this implementation not being type stable. Without proper return types I don't think that one could implement map in a way that it is always type stable. Consider these two examples

a = Nullable{Int64}(1)
b = Nullable{ASCIIString}("1")

map(cos, a)
map(parseint, b)

In the functional programming languages I know map is defined type wise as (a -> b) -> [a] -> [b]. Given a function from the types a to b map a collection [a] to a collection of type [b]

For Nullable in Julia the question now is what to return if a = Nullable{Int64}() if we choose to return a Nullable{Int64}() we guessed the right answer in all cases where f :: a -> a, but introduced a type instability for all cases where f :: a -> b (See the example above for two use cases)

Based on a #9364 (comment) by Jeff I used Nullable{Union()}() as return type for map(cos, Nullable{Int64}()), because Nullable{Union()}() should not propagate any further into the type system.

@vchuravy
Copy link
Member Author

For me Nullable is certainly a container and hence a certain subset of operations over containers makes sense. I am indifferent to it being iterable, because its semantics differ slightly from most other containers.

@nalimilan
Copy link
Member

?map says:

Transform collection "c" by applying "f" to each element.

I think we're abusing the definition if map here. We'd better find an alternative syntax for isnull(x) ? Nullable{Union()}() : Nullable(f(get(x))).

As an example of the lacks that this PR does not fill, how are we going to apply f to all elements of a NullableArray or of an Array{Nullable}? map(x->map(f, x), a)? That's ugly.

@vchuravy
Copy link
Member Author

@nalimilan
But the same problem holds true for every collection that contains collections.

Maybe this is abusing the definition of map as it currently is. But the concept of map comes from functional programming and from that perspective of programming this feels quite natural.

But this divide whether or not Nullable should be seen as a container type is already visible in the original PR #8152

@nalimilan
Copy link
Member

@nalimilan
But the same problem holds true for every collection that contains collections.

Sure, but the difference is that collections of collections are relatively rare (and when they are commonly needed they are wrapped in more usable types, like DataFrame). OTC arrays of Nullable are going to be the most common array type for a whole class of users.

But I get your point that in functional languages nullable/option types are considered as collections. I'm not completely sold on that idea, though, as it seems like an unnecessarily verbose way of operating on nullable values. R's semantics are quite efficient on that front -- though with the drawback that all values are nullable, which is a nightmare when writing complex systems. A middle-term solution sounds more reasonable for all use cases.

@vchuravy
Copy link
Member Author

travis failed on osx due to a timeout.

Sure, but the difference is that collections of collections are relatively rare (and when they are commonly needed they are wrapped in more usable types, like DataFrame). OTC arrays of Nullable are going to be the most common array type for a whole class of users.

True and the ways functional programming languages handle that problem are slightly verbose. You have to use either a double map or a map and pattern matching.

R's semantics are quite efficient on that front -- though with the drawback that all values are nullable, which is a nightmare when writing complex systems.

How does R handle that problem? I have not coded enough in it to encounter this or maybe I did, but I did not recognized it as such.

A middle-term solution sounds more reasonable for all use cases.

I think that especially in the context of DataFrames most will not deal with Array{Nullable{T}, N} but with NullableArray{T, N} and that in that context we could more easily provide a method that takes a function f: a -> b and applies it to [[a]] resulting in [[b]] ([[T]] being NullableArray[T]). Such a method could be called map even though it actually is map(x-> map(f, x), xs), because one would expect map to result in the same container type, but for me that is a different debate.

@nalimilan
Copy link
Member

R's semantics generalize what you suggest for map: f(x::Nullable) = isnull(x) ? Nullable{Union()}() : Nullable(f(get(x))). Also as in R scalars are vectors (something quite scary), you also get ``f(a::Array{Nullable}) = [isnull(x) ? Nullable{Union()}() : Nullable(f(get(x))) for x in a]for all element-wise operations. So you can see that even callingmap(x, f)` is quite annoying compared to just `f(x)`.

@johnmyleswhite
Copy link
Member

That is not an accurate description of R's semantics.

-- John

Sent from my iPhone

On Dec 25, 2014, at 1:54 PM, Milan Bouchet-Valat notifications@github.com wrote:

R's semantics generalize what you suggest for map: f(x::Nullable) = isnull(x) ? Nullable{Union()}() : Nullable(f(get(x))). Also as in R scalars are vectors (something quite scary), you also get `f(a::Array{Nullable}) = [isnull(x) ? Nullable{Union()}() : Nullable(f(get(x))) for x in a] for all element-wise operations. So you can see that even calling map(x, f) is quite annoying compared to just f(x).


Reply to this email directly or view it on GitHub.

@vchuravy
Copy link
Member Author

vchuravy commented Jan 1, 2015

I wanted to write a long and passionated post on the mailing list, about the semantics and possible usages of Nullable in Julia and then stumbled about a blog post that discusses the design of Java's Optional type http://blog.codefx.org/jdk/dev/design-optional/ that made me reconsider my opinion of using Nullable as a full container type.

I don't like the call overloading since it also opens the way for f(xs :: Iterable) = map(f, xs), which I find increases the mental load when working with multiple-dispatch, because it adds an other layer of complexity when I try to understand which method is actually called from a piece of code.

I would rather see a null coalescing operator like ?? that would be an alias for get(x :: Nullable, default), which would reduce the need for a map operation, because instead of writing

f(x :: Int64) = x < 0 ? Nullable{Int64}() : Nullable(x)

a = f(-1)
b = map(sin, a)
c = map(cos, b)
result = get(c, reasonable_default)

you can write.

a = f(-1) ?? 0
b = sin(a)
result = cos(b)

Of course both styles solve a different problem and are orthogonal to each other and I would prefer having both map and ??. With map you retain the information that there was a null-value at some time in the computation and you select a default value at the last possible moment and with the null coalescing operator you select a default operation a the earliest possible moment.

@johnmyleswhite
Copy link
Member

With map you retain the information that there was a null-value at some time in the computation and you select a default value at the last possible moment and with the null coalescing operator you select a default operation a the earliest possible moment.

I think this is the crux of the problem. My goal with the current design was to push people towards handling null values as soon as they occur, but there is obviously a lot of code in the world based on propagating null values from the start of a computation all the way to the end.

@nalimilan
Copy link
Member

That's indeed a use of Optional which is advocated by the post you linked to:

Following this principle will lead to Optionals being mostly created as return values. But I see no reason to reflexively and immediately extract the actual value (or use the default value). Especially not if the absence of a value might change the logical flow at some point in the future. The Optional box should then be handed over as is (again: if no other way exists).

http://blog.codefx.org/jdk/dev/design-optional/

Indeed when working with statistical data, in many cases "no other way exists', i.e. you don't have any default value to provide, you just want to keep missing values as they are. When you are working with a DataFrame, you don't want to exclude all observations that have one missing value on one of the variables, or you will lose too much information: you want to be able to exclude (or impute) missing values for the restricted set of variables that a specific analysis requires, just at the moment when you perform this analysis.

But I completely agree that in other situations (more frequent in "normal" coding), propagating the missingness is not the best solution. The question is, can we make these two usages cohabit happily? Doesn't sound impossible. At least, contrary to what happens when NULL pointers are conveyed invisibly in an object reference, in Julia code you immediately see that a variable is a Nullable instead of a scalar, and thus that the person who wrote the code intended to handle missing values.

@eschnett
Copy link
Contributor

eschnett commented Jan 6, 2015

Yes, one needs both cases: Substituting a "default" value for a missing value, and propagating "missingness".

Nullable types are probably just a special case of a more general pattern. My earlier generalization to "containers" didn't fly so well (since containers usually can contains many values, while nullable types hold at most one). So I'm trying to look for other cases in Julia where types "hold" another value but are not containers. Maybe "wrappers" would be a better name for them:

  • RemoteRef: values living on another process
  • Task: similar to futures in other languages
  • serialize/deserialize

Both RemoteRef and Task not only "contain" a value, but their main purpose is something else: a RemoteRef knows on which process the value lives, and a Task holds the administrative data to manage a task. But in both cases, there are ways to:
(1) obtain the wrapped values: fetch(remoteref), wait(task)
(2) daisy-chain operations, i.e. perform a calculation remotely with the RemoteRef as input, producing another RemoteRef, or creating a new task that runs as soon as the current task has finished.

These correspond to the two cases for nullable types, i.e. substituting a default, or propagating missingness. Maybe "map" isn't a good name for this (although it's called "map" in other languages), but there seems to be a generic pattern here, and a simple syntax would be convenient not just for nullable types. Concurrent programming (with Julia's efficient tasks) or distributed programming (simplifying handling RemoteRefs) would also benefit from that.

@johnmyleswhite
Copy link
Member

What about the following proposal?

  • Replace map with propagate.
  • Add a @propagate macro that changes all function calls in a block of code to be calls to propagate.

This means that propagation continues to be opt-in (which we wouldn't get from call overloading), but is much less verbose in long-nested expressions.

@nalimilan
Copy link
Member

@johnmyleswhite That would work, but overloading call has the advantage (for me) that it wouldn't require adding @propagate everywhere. What would be the interest of forcing this behavior to be opt-in?

@eschnett The parallel with RemoteRef is interesting (not sure about tasks, they are more complex). One could imagine allowing to write:

julia> r = @spawn rand(2,2)
RemoteRef(1,1,0)

julia> s = @spawn exp(r) # instead of exp(fetch(r))
RemoteRef(1,1,1)

fetch is indeed very similar to get in that it retrieves a value stored in a wrapper.

@johnmyleswhite
Copy link
Member

@johnmyleswhite That would work, but overloading call has the advantage (for me) that it wouldn't require adding @propagate everywhere. What would be the interest of forcing this behavior to be opt-in?

The advantage is that the current definition of Nullable allows you to have functions that return null sometimes, but which can't quietly propagate this information forward through a long stream of computations. Instead, you get what I'll call a "fail on first point of contact" reaction, where the mere fact that the object's type is Nullable only allows you to pass that object to functions that have consciously chosen to handle null in a certain way. This encourages you to handle the occasional null case immediately after a null value occurs, rather than let it poison downstream computations where null is completely unexpected. If you've ever had to track down the origin of an unexpected null and discovered that the origin occurred 5-10 function calls earlier than you would have guessed, you'll understand the debugging gains to be had from discouraging automatic propagation of what is arguably an error code.

That said, I'm starting to think that the software engineering use cases for Nullable and the stats use cases for NA are different enough that we may ultimately need two distinct types. A null database connection is different in important respects from a null integer in a database: it's essentially always going to cause fatal errors downstream and should be handled immediately.

For now, I would really prefer that we try using Nullable everywhere and see whether we can't survive while conflating the concepts (just as SQL does), but it does seem that the SWE arguments for Nullable are mildly anti-propagation, while the stats arguments are mildly pro-propagation.

@johnmyleswhite
Copy link
Member

The more I think about this, the more convinced I'm becoming that we should separate out a concept of Nullable and a concept of Observable. Nullable is basically like an error code wrapper around a value and asserts the complete non-existence of a value (e.g. a NULL value returned from a parse operation); whereas Observable is an acknowledgement that some values aren't observed, but do in fact exist (e.g. someone's birthday). Observable could propagate by default, while Nullable would fail by default.

@stevengj
Copy link
Member

stevengj commented Jan 8, 2015

@johnmyleswhite, I thought our Nullable type was mainly intended for your "Observable" use-case (e.g. in DataFrames), and so the semantics should be optimized for that case.

Generally, why do we need an "error code wrapper around a value" if we have exceptions? I guess there are a few cases like parseint where we don't want to use exceptions for performance reasons, but these seem like the exception (no pun intended) rather than the rule, and in these cases you typically won't worry about propagation because callers will check the result immediately.

@johnmyleswhite
Copy link
Member

I think our Nullable type is currently a slight conflation of the "Nullable" and "Observable" use cases. (I've definitely been trying to satisfy competing objectives with it, which is why I've tried to prevent automatic propagation until this thread started.)

We've started talking about using Nullable in cases like parseint and I'd like to see it used much more often, but that pushes us away from the DataFrames case. We can say those use cases are exceptional for Nullable types, but I feel they're worth formalizing with a type (even if that type isn't ultimately called Nullable).

in these cases you typically won't worry about propagation because callers will check the result immediately.

Using the type system to force callers to check results for nullness is a big part of the argument for the Maybe type that Nullable is patterned after. Sadly, the best writeup of this perspective is only available from a Google cache right now: http://webcache.googleusercontent.com/search?q=cache:B1AP4ZRvp9oJ:nickknowlson.com/blog/2013/04/16/why-maybe-is-better-than-null/+&cd=1&hl=en&ct=clnk&gl=us

@nalimilan
Copy link
Member

If you've ever had to track down the origin of an unexpected null and discovered that the origin occurred 5-10 function calls earlier than you would have guessed, you'll understand the debugging gains to be had from discouraging automatic propagation of what is arguably an error code.

@johnmyleswhite Actually I have hit this problem in R when developing software, which made me consider it's not a reasonable language for anything beyond stats. But in Julia at least when you have an Int or a Bool, you know it's not null. So one could simply strongly advise developers to call get on a Nullable as soon as they can, i.e. never work with Nullable when they don't strictly need it. Though I understand that you may want to make this stronger than a simple advice by pushing this requirement into the type system itself.

I can see at least two ways forward here:

  1. As you said, introduce two types, say Nullable and Option, with propagate vs. fail semantics.
  2. Keep only Nullable, failing by default, but offer a convenient syntax to propagate missingness. Something like f?(x), f(?x) or f(x?) à la Kotlin or C# 14 could be enough to reconciliate both approaches to missingness.(But agreeing on syntax extensions is always hard.)

The advantages of 2) is that it does not require the API developer to decide what kind of missingness should apply to a returned value (in some cases it might be ambiguous). But more fundamentally, it also seems that even when not dealing with statistical data, the pattern y = isnull(x) ? null : Nullable(f(get(x))) (and more generally the need for propagation in some local places) is quite common: see for example http://www.interact-sw.co.uk/iangblog/2008/04/13/member-lifting ("What I’d Like" section), where the author would appreciate a dedicated syntax for this in C#. (Note that C# sometimes has a propagation ["lifting"] behavior, but not consistently, which should clearly be avoided.)

@davidagold
Copy link
Contributor

What about an immutable NullableFunction{T} type that can be invoked to propagate nullability? This suggestion is based on the notion that, if one intends to propagate null through a chain of functions, then a Nullable object can be thought of as an "argument" component of the propagation that requires a complementary "function" component for success. The parameter T indicates which Nullable() ought to be returned in the case of a null argument. NullableFunction{T} could be just a simple wrapper for a given function, as below. There I've also tried to produce a syntax similar to what @nalimilan has suggested in his approach (2) above. For the most part, those syntax suggestions are made difficult by the precedence of ? as the ternary operator. Here's where I got to:

immutable NullableFunction{T}
    func::Function
end

Base.get::NullableFunction) = η.func

? = NullableFunction{Union()}    # using 'typealias'' runs into ternary operator troubles 

# Concatenation as a means of instance construction
function Base.(:*){T}(::Type{NullableFunction{T}}, f::Function)
    return NullableFunction{T}(f)
end

# Overload 'call' for 'NullableFunction's with the moral equivalent of 
# @vchuravy's original 'map' function
function Base.call{T}::NullableFunction{T}, x::Nullable)
    isnull(x) && return Nullable{T}()
    return Nullable(get(η)(get(x)))
end

which allows for the following behavior:

julia> x = Nullable(5)
Nullable(5)

julia> y = Nullable{Int}()
Nullable{Int64}()

julia> f(x) = x + 5
f (generic function with 1 method)

julia> (?f)(x)
Nullable(10)

julia> (?f)(y)
Nullable{Union()}()

Furthermore, by instead setting

? = NullableFunction

one allows for user-specified values of T for any returned Nullable{T}()s:

julia> (?{Int64}f)(y)
Nullable{Int64}()

@nalimilan
Copy link
Member

An interesting read about Nullable -- posting it here as people dealing with this are subscribed. The author's recommendations are very close to what Julia currently provides (except for the ? syntax), which is reassuring. http://joeduffyblog.com/2016/02/07/the-error-model/#non-null-types

@johnmyleswhite
Copy link
Member

That was a good read.

@hayd
Copy link
Member

hayd commented Apr 28, 2016

If we are able to use return_types then we can get that type stability (which IIUC was some of the problems above / the reason this was closed?) with something like:

julia> function _map{T}(f, n::Nullable{T})
         isnull(n) && return Nullable{Union{Base.return_types(f, (T,))...}}()  # maybe the Union... is not needed (can I just take [1] ?)
         Nullable(f(n.value))
       end

julia> (n::Nullable)(f) = _map(f, n)

Syntax-wise this isn't as satisfying as haskell style do-block/monadic expressions.

@davidagold
Copy link
Contributor

IIRC return_types reflects the type inference algorithm, whose behavior is difficult to reason about and shouldn't be depended on for functionality like this.

@nalimilan
Copy link
Member

Now that we have a .( syntax for broadcast (#16285), I think we should define broadcast (and maybe map too) on Nullable to implement the lifting semantics discussed here. That way, we would have e.g. sin.(x) == Nullable(sin(x))). Return-type inference issues with null values are exactly the same problem as calling broadcast on an empty array (first bullet point at #16285, see #11034), so both can be fixed at the same time.

@yuyichao
Copy link
Contributor

The solution for broadcasting an empty array is likely going to be unsuitable for Nullables

@nalimilan
Copy link
Member

The solution for broadcasting an empty array is likely going to be unsuitable for Nullables

Why? For example, if the final choice is to return Union{}[], then for Nullable this would give Nullable{Union{}}(), which IIUC, could give efficient code at some point when the compiler gets smart enough. I'd think it's even less an issue than for arrays, given that you may want to add elements to an array (which wouldn't work with Union{}[]), while an empty Nullable will remain empty forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.