-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework replace and replace! #26206
Rework replace and replace! #26206
Conversation
Introduce a new replace!(new::Callable, res::T, A::T, count::Union{Nothing,Int}) method which custom types can implement to support all replace and replace! methods automatically, instead of the current replace!(new::Callable, A::T, count::Int). This offers several advantages: - For arrays, instead of copying the input and then replace elements, we can do the copy and replace operations at the same time, which is quite faster for arrays when count=nothing. - For dicts and sets, copying up-front is still faster as long as most original elements are preserved, but for replace(), we can apply replacements directly instead of storing a them in a temporary vector. - When the LHS of a pair contains a singleton type, we can subtract it from the element type of the result, e.g. Union{T,Missing} becomes T. Also simplify the dispatch logic by removing the internal _replace! method in favor of replace!.
ae4b229
to
03c073c
Compare
Anybody willing to review this? |
I will try to review within few days |
@rfourquet Any chance you could find the time to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work, thanks! My main questions in comments concern 1) why use nothing
for count
and 2) whether to name internal methods _replace!
or replace!
.
base/set.jl
Outdated
|
||
function _replace!(A, count::Integer, old_new::Tuple{Vararg{Pair}}) | ||
function replace!(res, A, count::Union{Integer,Nothing}, old_new::Tuple{Vararg{Pair}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not part of the API, so (as in the the other comment) I prefer it to not be a method of replace!
(example it also clobbers the output of mehods
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the problem is that it's not great to tell people to override _replace!
to implement the replace
/replace!
methods. I actually wonder whether we should make this replace!
method officially public, or at least document it under an # Implementation
part in the docstring.
An alternative is to find a better name than _replace!
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to make the method public. But the one above (replace!(res, A, count::Union{Integer,Nothing}, old_new::Tuple{Vararg{Pair}})
) is really an internal method which doesn't have to be overriden, so _replace!
is a better name for it I think.
base/set.jl
Outdated
@@ -600,16 +606,17 @@ julia> replace!(Set([1, 2, 3]), 1=>0) | |||
Set([0, 2, 3]) | |||
``` | |||
""" | |||
replace!(A, old_new::Pair...; count::Integer=typemax(Int)) = _replace!(A, count, old_new) | |||
replace!(A, old_new::Pair...; count::Union{Integer,Nothing}=nothing) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on why you need to introduce nothing
here: it seems to make the code slightly more complex, and it's not part of the API (at least you didn't document it)... why not keep typemax(Int)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage of nothing
is that it has a different type, which allows the compiler to optimize out instructions which are not needed (like the n < count
check, which introduces a branch an prevents SIMD). Also, for replace
, it allows subtracting singleton types when count=nothing
while keeping the function type-stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks. I'm a little sad to see this added complexity, but I guess losing type stabity in particular would be a deal breaker.
base/set.jl
Outdated
end | ||
|
||
promote_valuetype(x::Pair{K, V}) where {K, V} = V | ||
promote_valuetype(x::Pair{K, V}, y::Pair...) where {K, V} = | ||
promote_type(V, promote_valuetype(y...)) | ||
|
||
# Subtract singleton types which are going to be replaced | ||
@pure issingletontype(::Type{T}) where {T} = isdefined(T, :instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not Jameson so I can't comment on the use of @pure
here! ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @pure
is OK here, although the definition itself is wrong:
@pure issingletontype(T::DataType) = isdefined(T, :instance)
issingletontype(::Type) = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, I used @pure
correctly! Thanks for the pointer about DataType
.
base/set.jl
Outdated
# Subtract singleton types which are going to be replaced | ||
@pure issingletontype(::Type{T}) where {T} = isdefined(T, :instance) | ||
function subtract_singletontype(::Type{T}, x::Pair{K}) where {T, K} | ||
if issingletontype(K) # singleton type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment may seem slightly redundant :)
base/set.jl
Outdated
if x !== y | ||
push!(repl, x => y) | ||
c += 1 | ||
if res === A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth a small comment mentioning the optimization that can't be done in general when res !== x
.
base/set.jl
Outdated
# to make replace/replace! work for a new container type Cont, only | ||
# replace!(new::Callable, A::Cont; count::Integer=typemax(Int)) | ||
# replace!(new::Callable, res::Cont, A::Cont; count::Integer=typemax(Int)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method of replace!
is a nice addition to the API, but you didn't document it... I understand that it is felt that this function has already a too large API, so we could leave it undocumented for now. But I'm uncomfortable making it a method of replace!
then, as people will discover it and start to use it, so I'd rather have it be named _replace!
to make it clear that it's an internal method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you should update the type and default value for count
.
I would say, now that it's implemented, we can keep it. Maybe for other types of |
I see that in you benchmark code, you didn't interpolate the global variables |
I've just checked, and it doesn't make any difference in this case. AFAIK that's because everything happens inside the function (which gets specialized), and the cost of one dynamic dispatch is negligible compared with the time it takes to run. |
base/set.jl
Outdated
count::Integer=typemax(Int)) | ||
count < 0 && throw(DomainError(count, "`count` must not be negative")) | ||
count != 0 && _replace!(new, A, min(count, typemax(Int)) % Int) | ||
count::Union{Integer,Nothing}=nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are using the "new API" to implement the functionality for certain types, so it's fine to use replace!
, but then again I think we should keep calling _replace!
the specific implementation methods, which have the signature _replace!(new, res, A, count)
, which really is an implementation detail and is not part of any API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with using _replace!
is that custom types would have either to override _replace!
(which is unexported and doesn't have a very appealing name), or to reimplement all replace
and replace!
methods directly. Here that implies duplicating some boilerplate (with some tricks, like checking count === nothing
and min(count, typemax(Int)) % Int)
, which one can easily get wrong). In other places, it even implies duplicating the call to subtract_singletontype
, which is an unexported function and which people would better not have to bother with at all.
So I think it's much better to recommend package authors to implement a single low-level method, be it replace!
or another unexported method (ideally with a better name than _replace!
).
I worked on this and will push a commit to feed the discussion (I'm not yet fully happy with it). I'm still reluctant at calling One thing I noticed also is that with the new method, there is an ambiguity in a call like Also, |
My pushed commit may not seem to improve so much, but I guess my main points are: 1) don't call |
Actually, Let's say we keep |
Looks good! The use of |
OK, cool! @StefanKarpinski Could you just have a quick look at the singleton stuff and say whether you agree on the principle? That's the only part of the PR which affects public API (though in a subtle way). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I wonder if the special treatment of singleton types for replace
with pairs desserves a mention in the documentation?
@@ -631,7 +643,7 @@ julia> replace!(isodd, A, 0, count=2) | |||
``` | |||
""" | |||
replace!(pred::Callable, A, new; count::Integer=typemax(Int)) = | |||
replace!(x -> ifelse(pred(x), new, x), A, count=count) | |||
replace!(x -> ifelse(pred(x), new, x), A, count=check_count(count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This replace!
method already checks count
, so not really needed to use check_count
here (or did you mean to save one specialization when count
is not an Int
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added it to avoid the specialization and to throw the error as early as possible. But maybe that's not worth it...
base/set.jl
Outdated
@@ -686,16 +694,34 @@ julia> replace([1, 2, 1, 3], 1=>0, 2=>4, count=2) | |||
3 | |||
``` | |||
""" | |||
function replace(A, old_new::Pair...; count::Integer=typemax(Int)) | |||
function replace(A, old_new::Pair...; count::Union{Integer,Nothing}=nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of Nothing
here for type stability is quite subtle; if you re-push to this branch for another change, may be worth a small comment to explain this.
c == count && break | ||
end | ||
for oldnew in repl | ||
pop!(res, askey(first(oldnew), res)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor comment: as res === A
, working with only one of A
or res
may make it slightly easier to read.
Would be good indeed if someone comments on the "singleton stuff", basically that |
Travis failure on Mac is unrelated. I'll merge tomorrow barring objections. |
Introduce a new
replace!(new::Callable, res::T, A::T, count::Union{Nothing,Int})
method which custom types can implement to support all replace and replace! methods automatically, instead of the currentreplace!(new::Callable, A::T, count::Int)
. This offers several advantages:count=nothing
.replace
, we can apply replacements directly instead of storing a them in a temporary vector.of the result, e.g.
Union{T,Missing}
becomesT
.Also simplify the dispatch logic by removing the internal _replace! method in favor of replace!.
Benchmarks indicate that this PR significantly improves performance for arrays (except for
Int8
, for which it's stable), but not so much for sets (except forInt8
, which are faster for some reason). So the specialized code for sets/dicts when origin and destination are different might not be worth adding.