Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add skipnothing function which returns an iterator without nothing #30549

Closed
wants to merge 12 commits into from

Conversation

galenlynch
Copy link
Contributor

@galenlynch galenlynch commented Jan 1, 2019

This adds a new function skipnothing, which mirrors the utility of skipmissing except for
nothing values instead of missing ones.

To achieve this, I have converted the logic behind skipmissing into a more general function, skipoftype(T, itr), which skips over elements of type T in an iterator itr.

skipmissing(itr) is now simply defined as:

skipmissing(itr) = skipoftype(Missing, itr)

Similarly, the new function skipnothing(itr) is simply skipoftype(Nothing, itr).

The benefits of skipnothing versus filtering or array comprehensions are described below.

@galenlynch
Copy link
Contributor Author

It looks like the build failures are unrelated to the code in the PR, but I'm not 100% sure how to interpret them.

@galenlynch
Copy link
Contributor Author

Just to motivate the need for skipnothing, if you have an iterable, itr that contains nothing values that you want to remove, there are two existing and straight-forward ways to do this, each with its own problem:

  1. filter the iterable: filter(!isequal(nothing), itr), which produces an array with element type Union{Nothing, T}. It would be nice to drop the Union element type.
  2. Array comprehension: [x for x in itr if x !== nothing], which is straight forward, and no longer has the Union element type, but Julia cannot guess the element type making it type-unstable.

Whereas skipnothing drops the union element type and is type-stable.

It seems to me that if skipmissing is useful (which I think it is), then skipnothing would be equally useful.

@iamed2
Copy link
Contributor

iamed2 commented Jan 2, 2019

If https://github.com/JuliaLang/julia rejects this (without a simple alternative), it's welcome at https://github.com/JuliaCollections/IterTools.jl

@tkf
Copy link
Member

tkf commented Jan 2, 2019

It may be worth considering adding skipoftype(T, itr) (say) that is equivalent to (x for x in itr if !(x isa T)) instead. It would be beneficial for any union types. Also, some portion of skipmissing could be re-written in terms of skipoftype(T, itr), to reduce duplicated code.

@galenlynch
Copy link
Contributor Author

Good idea @tkf, I just did that.

@nalimilan
Copy link
Member

Makes sense. Could those who downvoted the proposal give their reasons?

@tkoolen
Copy link
Contributor

tkoolen commented Jan 5, 2019

From https://discourse.julialang.org/t/how-to-calculate-a-weighted-mean-with-missing-observations/19281/12, how about

skip(somevalue, iterator)

where x in iterator is skipped if x !== somevalue? That way you could also have skip(NaN, iterator). It might (unfortunately) need a different name because of the existing skip function.

I think using x !== missing may also currently be better for performance than !ismissing(x) (see https://discourse.julialang.org/t/ann-transducers-jl-efficient-and-composable-algorithms-for-map-and-reduce-like-operations/19159/11), although maybe that shouldn't impact this design decision.

@tpapp
Copy link
Contributor

tpapp commented Jan 5, 2019

Could those who downvoted the proposal give their reasons?

I consider various implementations of filter more general. One thing it does not do is narrow type, but that could be done with specializing filter(!isa(T), ...::AbstractVector{Union{T, S}}) or something like that, with one-argument isa similar to isequal, or some other function.

@galenlynch
Copy link
Contributor Author

galenlynch commented Jan 6, 2019

@tpapp So you're opposed to the introduction of a new function name dedicated to handling the presence of nothing values in an iterator?

If so, I disagree with that sentiment: as is shown by #29679 nothing seems to be special in some way, and julia has functions that are explicitly dedicated to working with nothing values. The fact that many functions in base return Union{Nothing, ...} makes it clear to me that nothing is somehow a special type in the practice of using Julia. Futhermore, this PR does the same as #29679, except where that PR extended ismissing to isnothing, this one extends skipmissing to skipnothing, and also where isnothing can be easily substituted with x === nothing, there is no existing alternative for the behavior for skipnothing.

Your suggested specialization of filter(!isa(T), ::AbstractArray) is very similar to skipoftype in this PR, except skipoftype works for any iterator, instead of just abstract arrays. Also it would seem like surprising behavior, in a bad way, to have filter change the return type only when used with one or two functions, as well as not narrowing the type with filter! for that function.

@tpapp
Copy link
Contributor

tpapp commented Jan 6, 2019

I am just cautious about filling in every combination of somethingT, where something = skip, is, ... and T is a singleton type. I would try to see if a better solution can be found.

I see nothing intrinsically wrong with filter narrowing type if it can, as long as it is done in a type stable way (which applies here).

I also think the functionality of this particular PR could live in a package while people experiement with this.

@galenlynch
Copy link
Contributor Author

galenlynch commented Jan 6, 2019

I see your point, but this is just filling out a missing slot with something = skip to the small set of two important singleton types Missing and Nothing that you commonly get from base Julia. It is not creating a new group of somethingT functions.

Even if skipnothing is not accepted, converting the logic behind skipmissing(itr) to skipoftype(T, itr) and redefining skipmissing as skipmissing(itr) = skipoftype(Missing, itr) seems strictly better, at least to me. If skipoftype is also exported, then it will be easy to deal with arrays containing nothing, as you can get from broadcasting with the many functions in base that can return nothing, until a "better" solution can be found.

Maybe someone else can integrate the type narrowing into filter, but wouldn't that also change the behaviour of filter? I don't know what the threshold for a breaking change is, but adding type narrowing would definitely change the outward facing behaviour of filter.

@galenlynch
Copy link
Contributor Author

Also, skipT is different than filter(!isa(T), ...) because no intermediate array is formed if you use the result with reduce, mapreduce, etc...

@iamed2
Copy link
Contributor

iamed2 commented Jan 6, 2019

Also, skipT is different than filter(!isa(T), ...) because no intermediate array is formed if you use the result with reduce, mapreduce, etc...

There's also Base.Iterators.filter

@galenlynch
Copy link
Contributor Author

@iamed2 I wasn't aware of that, thanks for pointing it out.

@galenlynch
Copy link
Contributor Author

I don't know how things work around here: is this PR still under consideration?

@tpapp
Copy link
Contributor

tpapp commented Jan 13, 2019

Sometimes PRs that suggest changes to the API are revisited later, especially if there is other stuff going on (new releases).

base/skipoftype.jl Outdated Show resolved Hide resolved
@nalimilan nalimilan added missing data Base.missing and related functionality triage This should be discussed on a triage call labels Jan 20, 2019

IteratorSize(::Type{<:SkipOfType}) = SizeUnknown()
IteratorEltype(::Type{SkipOfType{T, A}}) where {T, A} = IteratorEltype(A)
eltype(::Type{SkipOfType{T, A}}) where {T, A} = union_poptype(T, eltype(A))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typesubtract?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Copy link
Contributor

@iamed2 iamed2 Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galenlynch typesubtract is already a function that exists, at Core.Compiler.typesubtract

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I had no idea! I changed the PR to use the existing function.

This adds `skipnothing`, which mirrors the utility of `skipmissing` except for
`nothing` values. The code underlying `skipnothing` is largely copy and pasted
from `skipmissing`, with necessary modifications.
Introduces `skipoftype(T, itr)` which skips elements in an iterator of type `T`.
This can be used to replace `skipmissing` as well as `skipnothing`.
Previously, `union_poptype` would fail if the arguments only partially
overlapped: i.e. `union_poptype(Union{A, B}, Union{A, C})`. First taking the
union of all the types allows `union_poptype` even in this edge-case.
@galenlynch
Copy link
Contributor Author

I attempted to rebase this PR onto master, but I'm not sure which exception to throw if getindex encounters an element that is supposed to be skipped. For SkipMissing, a MissingException was thrown. For a lack of a clear alternative, I kept that exception type, but changed the message.

@galenlynch
Copy link
Contributor Author

@nalimilan I see that many of the rebase conflicts I encountered came from #31008. I've tried to port over your PR to SkipOfType.

@StefanKarpinski
Copy link
Member

Triage wants a more general API where you give a type to exclude, e.g. skip(nothing, ...) and skip(missing, ...) etc. There's also the possibility of specializing filter and/or Iterators.filter on predicates of the form !==(singleton) to get a better result type.

@StefanKarpinski
Copy link
Member

Another approach would be to have skip(Nothing, ...) and skip(Missing, ...) taking a type.

@galenlynch
Copy link
Contributor Author

As part of this PR I converted the core of skipmissing to skipoftype(::Type, ...). skipmissing is now just skipoftype(Missing, ...), and similarly skipnothing(...) is skipoftype(Nothing, ...). Would that fit the bill?

@nalimilan
Copy link
Member

AFAICT the difference between skip(singleton, itr)/skipoftype(T, itr) and Iterators.filter(!==(singleton), itr) is that the former supports indexing when itr is an array (like skipmissing currently). Maybe we could simply extend Iterators.filter to also support indexing when wrapping an array (that wouldn't be breaking)? Indeed, it would be weird to have skip and filter do something quite similar, and Iterators.filter(!isnothing, itr) is quite compact and generic.

@galenlynch
Copy link
Contributor Author

I think another difference is that Iterators.filter(!==(singleton), itr) does not drop the singleton type from the type union (maybe? I can't check at the moment), while skipoftype(T, itr) does.

@galenlynch
Copy link
Contributor Author

If we could make Iterators.filter also support type subtraction, then that would meet the use case that initially motivated this PR.

@tkf
Copy link
Member

tkf commented Mar 28, 2019

I think you can specialize Iterators.filter(::typeof(!isnothing), itr) to do that. Or maybe we can have an optional "type arithmetic" API that is used by Iterators.filter(f, itr) when f supports it.

@galenlynch
Copy link
Contributor Author

galenlynch commented Mar 28, 2019

So it seems like the most general API would allow
a) type subtraction for singleton values or types
b) indexing into the iterator
c) accept types or singleton values to skip over

@tkoolen also suggested allowing for skipping over other values, such as NaN

@nalimilan
Copy link
Member

Indeed, eltype(::Iterators.Filter) always returns the element type of the wrapped iterable. I'm not sure whether it would be OK or not to special-case isnothing/ismissing. Ideally we would subtract the type whenever the filtering function depends only on the type of its inputs (which isn't that common I'd say), but AFAICT determining this relies on inference, which we cannot rely on.

@StefanKarpinski
Copy link
Member

The other feature that skip(T, collection) can support is type subtraction: you know that the eltype of the result is typesubtract(eltype, T). That can sometimes be figured out for a filter call but usually not. That doesn't mean that we shouldn't also make Iterators.filter support indexing, but maybe also have skip(T, collection) since it is closer to what the user wants to write and easier for type inference to reason about the result type of.

Also change filter(f, itr::SkipMissing) to work with Skip instead of
SkipMissing. In addition, this method now pre-allocates its output instead of
appending elements to an empty array.
@nalimilan
Copy link
Member

I'm still a bit worried about adding a new skip function, as nothing in its name indicates it should skip values according to their type, rather than just skipping values according to a predicate like filter and Iterators.filter do. One could easily confuse these functions, which makes the language harder to grasp. I'm also not sure there's a lot of use for a general skip(T, itr) function apart from skipmissing and skipnothing, so it sounds kind of radical to claim such a common term in Base just for this.

Overall I'd be more inclined to add a special case to Iterators.filter (and maybe filter), e.g. for !==(x) when x is a singleton type or for !isnothing/!ismissing. The former would be consistent with what we do with replace(itr, nothing=>0).

@galenlynch
Copy link
Contributor Author

galenlynch commented Mar 30, 2019 via email

@galenlynch
Copy link
Contributor Author

@nalimilan I don't really understand what you're proposing, move all the code that used to belong to skipmissing to Iterators.filter?

It doesn't seem like there's much consensus on which name to use. I don't have a strong opinion.

@tpapp
Copy link
Contributor

tpapp commented Mar 31, 2019

Generally, using up commonly used short verbs for function names in Base should be done only when the benefit is really great (eg collect), so IMO skip should be something else.

It is generally preferred to built up functionality from existing vocabulary, especially if that can be done with zero overhead. My preferred solution would be specializing

filter(!isequal(nothing), [1, nothing, 2])

so that it returns a Vector{Int}. But some people think that preserving element type is somehow in the contract of filter, while some people are happy with it being just type stable.

Finally, I realize that it is frustrating to make a PR and then receive conflicting suggestions and/or not getting a clear direction for finalizing it. But with a seemingly simple addition to the language, you ran into a problem which hinges on many questions which are not yet clearly answered, and since things in Base come with a promise of not breaking things, people are cautious about committing to a solution they may find suboptimal in the future. In the meantime, I would suggest that you put this in a package.

@galenlynch
Copy link
Contributor Author

galenlynch commented Apr 7, 2019

How about just exporting skipoftype, which generalizes skipmissing to any type, and has a fairly explicit name that is not a short verb. skipmissing(collection) would just be implemented as skipoftype(Missing, collection), and all of these other language questions (e.g. should Itereators.filter be allowed to drop types, is nothing on par with missing and deserve similar language support, etc...) can be decided later.

@galenlynch
Copy link
Contributor Author

galenlynch commented Apr 7, 2019

Or not even exporting skipoftype. If it was unexported, people could still use it to make their own iterators that drop nothing elements (or other types) from collections. Copying all this code that just mirrors the skipmissing code into a package, instead of making skipmissing more general, seems like kind of a waste.

@nalimilan
Copy link
Member

skipoftype sounds OK. Though I'd still like triage to discuss whether special-casing !== with singletons in Iterators.filter would be acceptable (as I proposed above), as that would be more generic and consistent with replace.

@tkf
Copy link
Member

tkf commented Apr 8, 2019

If skipoftype were chosen, skipoftype(Union{Nothing, Missing}, collection) would be supported since typesubtract is used, right? Also, it can be used for user-defined types. Of course, Iterators.filter can be equally powerful if it is specialized for !isa(T) (and unary isa is added).

@vtjnash vtjnash removed the triage This should be discussed on a triage call label Apr 22, 2021
@galenlynch galenlynch deleted the add_skipnothing branch April 26, 2021 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants