Skip to content

Commit

Permalink
More thorough aliasing detection in nonscalar copy, broadcast and ass…
Browse files Browse the repository at this point in the history
…ignment (#25890)

This commit puts together the beginnings of an "interface" for detecting and avoiding aliasing between arrays before performing some mutating operation on one of them.

`A′ = unalias(dest, A)` is the main entry point; it checks to see if `dest` and `A` might alias one another, and then either returns `A` or a copy of `A`. When it returns a copy of `A`, it absolutely must preserve the same type. As such, this function calls an internal `unaliascopy` function, which defaults to `copy` but ensures it returns the same type.

`mightalias(A, B)` is a quick and conservative check to see if two arrays alias eachother.  Instead of requiring every array to know about every other array type, it simply asks both arrays for their `dataids` and then looks for an empty intersection.

Finally, `dataids(A)` returns a tuple of `UInt`s that describe something about the region of memory the array occupies. It defaults to simply `(objectid(A),)`. A custom implementation for an array with multiple fields of arrays would look something like `(dataids(A.a)..., dataids(A.b)..., ...)`.

Fixes #21693, fixes #15209, and a pre-req to get tests passing for #24368 (since we had spot checks for `Array` aliasing, but moving to broadcast means we need a slightly more generic solution).  These functions remain un-exported for now since I'd like us to use them a bit more before we officially support them for all custom arrays.
  • Loading branch information
mbauman authored Feb 27, 2018
1 parent 867a73c commit f38a734
Show file tree
Hide file tree
Showing 15 changed files with 395 additions and 84 deletions.
104 changes: 104 additions & 0 deletions base/abstractarray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1047,6 +1047,110 @@ function _setindex!(::IndexCartesian, A::AbstractArray, v, I::Vararg{Int,M}) whe
r
end

"""
parent(A)
Returns the "parent array" of an array view type (e.g., `SubArray`), or the array itself if
it is not a view.
# Examples
```jldoctest
julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
1 2
3 4
julia> V = view(A, 1:2, :)
2×2 view(::Array{Int64,2}, 1:2, :) with eltype Int64:
1 2
3 4
julia> parent(V)
2×2 Array{Int64,2}:
1 2
3 4
```
"""
parent(a::AbstractArray) = a

## rudimentary aliasing detection ##
"""
Base.unalias(dest, A)
Return either `A` or a copy of `A` in a rough effort to prevent modifications to `dest` from
affecting the returned object. No guarantees are provided.
Custom arrays that wrap or use fields containing arrays that might alias against other
external objects should provide a [`Base.dataids`](@ref) implementation.
This function must return an object of exactly the same type as `A` for performance and type
stability. Mutable custom arrays for which [`copy(A)`](@ref) is not `typeof(A)` should
provide a [`Base.unaliascopy`](@ref) implementation.
See also [`Base.mightalias`](@ref).
"""
unalias(dest, A::AbstractArray) = mightalias(dest, A) ? unaliascopy(A) : A
unalias(dest, A::AbstractRange) = A
unalias(dest, A) = A

"""
Base.unaliascopy(A)
Make a preventative copy of `A` in an operation where `A` [`Base.mightalias`](@ref) against
another array in order to preserve consistent semantics as that other array is mutated.
This must return an object of the same type as `A` to preserve optimal performance in the
much more common case where aliasing does not occur. By default,
`unaliascopy(A::AbstractArray)` will attempt to use [`copy(A)`](@ref), but in cases where
`copy(A)` is not a `typeof(A)`, then the array should provide a custom implementation of
`Base.unaliascopy(A)`.
"""
unaliascopy(A::Array) = copy(A)
unaliascopy(A::AbstractArray)::typeof(A) = (@_noinline_meta; _unaliascopy(A, copy(A)))
_unaliascopy(A::T, C::T) where {T} = C
_unaliascopy(A, C) = throw(ArgumentError("""
an array of type `$(typeof(A).name)` shares memory with another argument and must
make a preventative copy of itself in order to maintain consistent semantics,
but `copy(A)` returns a new array of type `$(typeof(C))`. To fix, implement:
`Base.unaliascopy(A::$(typeof(A).name))::typeof(A)`"""))
unaliascopy(A) = A

"""
Base.mightalias(A::AbstractArray, B::AbstractArray)
Perform a conservative test to check if arrays `A` and `B` might share the same memory.
By default, this simply checks if either of the arrays reference the same memory
regions, as identified by their [`Base.dataids`](@ref).
"""
mightalias(A::AbstractArray, B::AbstractArray) = !_isdisjoint(dataids(A), dataids(B))
mightalias(x, y) = false

_isdisjoint(as::Tuple{}, bs::Tuple{}) = true
_isdisjoint(as::Tuple{}, bs::Tuple{Any}) = true
_isdisjoint(as::Tuple{}, bs::Tuple) = true
_isdisjoint(as::Tuple{Any}, bs::Tuple{}) = true
_isdisjoint(as::Tuple{Any}, bs::Tuple{Any}) = as[1] != bs[1]
_isdisjoint(as::Tuple{Any}, bs::Tuple) = !(as[1] in bs)
_isdisjoint(as::Tuple, bs::Tuple{}) = true
_isdisjoint(as::Tuple, bs::Tuple{Any}) = !(bs[1] in as)
_isdisjoint(as::Tuple, bs::Tuple) = !(as[1] in bs) && _isdisjoint(tail(as), bs)

"""
Base.dataids(A::AbstractArray)
Return a tuple of `UInt`s that represent the mutable data segments of an array.
Custom arrays that would like to opt-in to aliasing detection of their component
parts can specialize this method to return the concatenation of the `dataids` of
their component parts. A typical definition for an array that wraps a parent is
`Base.dataids(C::CustomArray) = dataids(C.parent)`.
"""
dataids(A::AbstractArray) = (UInt(objectid(A)),)
dataids(A::Array) = (UInt(pointer(A)),)
dataids(::AbstractRange) = ()
dataids(x) = ()

## get (getindex with a default value) ##

RangeVecIntList{A<:AbstractVector{Int}} = Union{Tuple{Vararg{Union{AbstractRange, AbstractVector{Int}}}},
Expand Down
16 changes: 6 additions & 10 deletions base/array.jl
Original file line number Diff line number Diff line change
Expand Up @@ -692,24 +692,20 @@ function setindex! end
# These are redundant with the abstract fallbacks but needed for bootstrap
function setindex!(A::Array, x, I::AbstractVector{Int})
@_propagate_inbounds_meta
A === I && (I = copy(I))
for i in I
I′ = unalias(A, I)
for i in I
A[i] = x
end
return A
end
function setindex!(A::Array, X::AbstractArray, I::AbstractVector{Int})
@_propagate_inbounds_meta
@boundscheck setindex_shape_check(X, length(I))
X′ = unalias(A, X)
I′ = unalias(A, I)
count = 1
if X === A
X = copy(X)
I===A && (I = X::typeof(I))
elseif I === A
I = copy(I)
end
for i in I
@inbounds x = X[count]
for i in I′
@inbounds x = X′[count]
A[i] = x
count += 1
end
Expand Down
16 changes: 13 additions & 3 deletions base/broadcast.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ module Broadcast
using .Base.Cartesian
using .Base: Indices, OneTo, linearindices, tail, to_shape,
_msk_end, unsafe_bitgetindex, bitcache_chunks, bitcache_size, dumpbitcache,
isoperator, promote_typejoin
isoperator, promote_typejoin, unalias
import .Base: broadcast, broadcast!
export BroadcastStyle, broadcast_indices, broadcast_similar,
broadcast_getindex, broadcast_setindex!, dotview, @__dot__
Expand Down Expand Up @@ -472,13 +472,23 @@ end
return dest
end

# For broadcasted assignments like `broadcast!(f, A, ..., A, ...)`, where `A`
# appears on both the LHS and the RHS of the `.=`, then we know we're only
# going to make one pass through the array, and even though `A` is aliasing
# against itself, the mutations won't affect the result as the indices on the
# LHS and RHS will always match. This is not true in general, but with the `.op=`
# syntax it's fairly common for an argument to be `===` a source.
broadcast_unalias(dest, src) = dest === src ? src : unalias(dest, src)

# This indirection allows size-dependent implementations.
@inline function _broadcast!(f, C, A, Bs::Vararg{Any,N}) where N
shape = broadcast_indices(C)
@boundscheck check_broadcast_indices(shape, A, Bs...)
keeps, Idefaults = map_newindexer(shape, A, Bs)
A′ = broadcast_unalias(C, A)
Bs′ = map(B->broadcast_unalias(C, B), Bs)
keeps, Idefaults = map_newindexer(shape, A′, Bs′)
iter = CartesianIndices(shape)
_broadcast!(f, C, keeps, Idefaults, A, Bs, Val(N), iter)
_broadcast!(f, C, keeps, Idefaults, A, Bs, Val(N), iter)
return C
end

Expand Down
85 changes: 68 additions & 17 deletions base/multidimensional.jl
Original file line number Diff line number Diff line change
Expand Up @@ -675,21 +675,19 @@ _iterable(v) = Iterators.repeated(v)
@generated function _unsafe_setindex!(::IndexStyle, A::AbstractArray, x, I::Union{Real,AbstractArray}...)
N = length(I)
quote
X = _iterable(x)
@nexprs $N d->(I_d = I[d])
x′ = _iterable(unalias(A, x))
@nexprs $N d->(I_d = unalias(A, I[d]))
idxlens = @ncall $N index_lengths I
@ncall $N setindex_shape_check X (d->idxlens[d])
Xs = start(X)
@ncall $N setindex_shape_check x′ (d->idxlens[d])
xs = start(x′)
@inbounds @nloops $N i d->I_d begin
v, Xs = next(X, Xs)
v, xs = next(x′, xs)
@ncall $N setindex! A v i
end
A
end
end

##

# see discussion in #18364 ... we try not to widen type of the resulting array
# from cumsum or cumprod, but in some cases (+, Bool) we may not have a choice.
rcum_promote_type(op, ::Type{T}, ::Type{S}) where {T,S<:Number} = promote_op(op, T, S)
Expand Down Expand Up @@ -1091,6 +1089,57 @@ end

### from abstractarray.jl

# In the common case where we have two views into the same parent, aliasing checks
# are _much_ easier and more important to get right
function mightalias(A::SubArray{T,<:Any,P}, B::SubArray{T,<:Any,P}) where {T,P}
if !_parentsmatch(A.parent, B.parent)
# We cannot do any better than the usual dataids check
return !_isdisjoint(dataids(A), dataids(B))
end
# Now we know that A.parent === B.parent. This means that the indices of A
# and B are the same length and indexing into the same dimensions. We can
# just walk through them and check for overlaps: O(ndims(A)). We must finally
# ensure that the indices don't alias with either parent
return _indicesmightoverlap(A.indices, B.indices) ||
!_isdisjoint(dataids(A.parent), _splatmap(dataids, B.indices)) ||
!_isdisjoint(dataids(B.parent), _splatmap(dataids, A.indices))
end
_parentsmatch(A::AbstractArray, B::AbstractArray) = A === B
# Two reshape(::Array)s of the same size aren't `===` because they have different headers
_parentsmatch(A::Array, B::Array) = pointer(A) == pointer(B) && size(A) == size(B)

_indicesmightoverlap(A::Tuple{}, B::Tuple{}) = true
_indicesmightoverlap(A::Tuple{}, B::Tuple) = error("malformed subarray")
_indicesmightoverlap(A::Tuple, B::Tuple{}) = error("malformed subarray")
# For ranges, it's relatively cheap to construct the intersection
@inline function _indicesmightoverlap(A::Tuple{AbstractRange, Vararg{Any}}, B::Tuple{AbstractRange, Vararg{Any}})
!isempty(intersect(A[1], B[1])) ? _indicesmightoverlap(tail(A), tail(B)) : false
end
# But in the common AbstractUnitRange case, there's an even faster shortcut
@inline function _indicesmightoverlap(A::Tuple{AbstractUnitRange, Vararg{Any}}, B::Tuple{AbstractUnitRange, Vararg{Any}})
max(first(A[1]),first(B[1])) <= min(last(A[1]),last(B[1])) ? _indicesmightoverlap(tail(A), tail(B)) : false
end
# And we can check scalars against eachother and scalars against arrays quite easily
@inline _indicesmightoverlap(A::Tuple{Real, Vararg{Any}}, B::Tuple{Real, Vararg{Any}}) =
A[1] == B[1] ? _indicesmightoverlap(tail(A), tail(B)) : false
@inline _indicesmightoverlap(A::Tuple{Real, Vararg{Any}}, B::Tuple{AbstractArray, Vararg{Any}}) =
A[1] in B[1] ? _indicesmightoverlap(tail(A), tail(B)) : false
@inline _indicesmightoverlap(A::Tuple{AbstractArray, Vararg{Any}}, B::Tuple{Real, Vararg{Any}}) =
B[1] in A[1] ? _indicesmightoverlap(tail(A), tail(B)) : false
# And small arrays are quick, too
@inline function _indicesmightoverlap(A::Tuple{AbstractArray, Vararg{Any}}, B::Tuple{AbstractArray, Vararg{Any}})
if length(A[1]) == 1
return A[1][1] in B[1] ? _indicesmightoverlap(tail(A), tail(B)) : false
elseif length(B[1]) == 1
return B[1][1] in A[1] ? _indicesmightoverlap(tail(A), tail(B)) : false
else
# But checking larger arrays requires O(m*n) and is too much work
return true
end
end
# And in general, checking the intersection is too much work
_indicesmightoverlap(A::Tuple{Any, Vararg{Any}}, B::Tuple{Any, Vararg{Any}}) = true

"""
fill!(A, x)
Expand Down Expand Up @@ -1161,33 +1210,35 @@ julia> y
copyto!(dest, src)

function copyto!(dest::AbstractArray{T,N}, src::AbstractArray{T,N}) where {T,N}
@boundscheck checkbounds(dest, axes(src)...)
for I in eachindex(IndexStyle(src,dest), src)
@inbounds dest[I] = src[I]
checkbounds(dest, axes(src)...)
src′ = unalias(dest, src)
for I in eachindex(IndexStyle(src′,dest), src′)
@inbounds dest[I] = src′[I]
end
dest
end

function copyto!(dest::AbstractArray{T1,N}, Rdest::CartesianIndices{N},
src::AbstractArray{T2,N}, Rsrc::CartesianIndices{N}) where {T1,T2,N}
src::AbstractArray{T2,N}, Rsrc::CartesianIndices{N}) where {T1,T2,N}
isempty(Rdest) && return dest
if size(Rdest) != size(Rsrc)
throw(ArgumentError("source and destination must have same size (got $(size(Rsrc)) and $(size(Rdest)))"))
end
@boundscheck checkbounds(dest, first(Rdest))
@boundscheck checkbounds(dest, last(Rdest))
@boundscheck checkbounds(src, first(Rsrc))
@boundscheck checkbounds(src, last(Rsrc))
checkbounds(dest, first(Rdest))
checkbounds(dest, last(Rdest))
checkbounds(src, first(Rsrc))
checkbounds(src, last(Rsrc))
src′ = unalias(dest, src)
ΔI = first(Rdest) - first(Rsrc)
if @generated
quote
@nloops $N i (n->Rsrc.indices[n]) begin
@inbounds @nref($N,dest,n->i_n+ΔI[n]) = @nref($N,src,i)
@inbounds @nref($N,dest,n->i_n+ΔI[n]) = @nref($N,src,i)
end
end
else
for I in Rsrc
@inbounds dest[I + ΔI] = src[I]
@inbounds dest[I + ΔI] = src[I]
end
end
dest
Expand Down
1 change: 1 addition & 0 deletions base/reinterpretarray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ struct ReinterpretArray{T,N,S,A<:AbstractArray{S, N}} <: AbstractArray{T, N}
end

parent(a::ReinterpretArray) = a.parent
dataids(a::ReinterpretArray) = dataids(a.parent)

function size(a::ReinterpretArray{T,N,S} where {N}) where {T,S}
psize = size(a.parent)
Expand Down
3 changes: 3 additions & 0 deletions base/reshapedarray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,9 @@ parent(A::ReshapedArray) = A.parent
parentindices(A::ReshapedArray) = map(s->1:s, size(parent(A)))
reinterpret(::Type{T}, A::ReshapedArray, dims::Dims) where {T} = reinterpret(T, parent(A), dims)

unaliascopy(A::ReshapedArray) = typeof(A)(unaliascopy(A.parent), A.dims, A.mi)
dataids(A::ReshapedArray) = dataids(A.parent)

@inline ind2sub_rs(::Tuple{}, i::Int) = i
@inline ind2sub_rs(strds, i) = _ind2sub_rs(strds, i - 1)
@inline _ind2sub_rs(::Tuple{}, ind) = (ind + 1,)
Expand Down
45 changes: 19 additions & 26 deletions base/subarray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -59,41 +59,34 @@ similar(V::SubArray, T::Type, dims::Dims) = similar(V.parent, T, dims)

sizeof(V::SubArray) = length(V) * sizeof(eltype(V))

"""
parent(A)
Returns the "parent array" of an array view type (e.g., `SubArray`), or the array itself if
it is not a view.
# Examples
```jldoctest
julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
1 2
3 4
julia> V = view(A, 1:2, :)
2×2 view(::Array{Int64,2}, 1:2, :) with eltype Int64:
1 2
3 4
julia> parent(V)
2×2 Array{Int64,2}:
1 2
3 4
```
"""
parent(V::SubArray) = V.parent
parentindices(V::SubArray) = V.indices

parent(a::AbstractArray) = a
"""
parentindices(A)
From an array view `A`, returns the corresponding indices in the parent.
"""
parentindices(a::AbstractArray) = ntuple(i->OneTo(size(a,i)), ndims(a))

## Aliasing detection
dataids(A::SubArray) = (dataids(A.parent)..., _splatmap(dataids, A.indices)...)
_splatmap(f, ::Tuple{}) = ()
_splatmap(f, t::Tuple) = (f(t[1])..., _splatmap(f, tail(t))...)
unaliascopy(A::SubArray) = typeof(A)(unaliascopy(A.parent), map(unaliascopy, A.indices), A.offset1, A.stride1)

# When the parent is an Array we can trim the size down a bit. In the future this
# could possibly be extended to any mutable array.
function unaliascopy(V::SubArray{T,N,A,I,LD}) where {T,N,A<:Array,I<:Tuple{Vararg{Union{Real,AbstractRange,Array}}},LD}
dest = Array{T}(uninitialized, index_lengths(V.indices...))
copyto!(dest, V)
SubArray{T,N,A,I,LD}(dest, map(_trimmedindex, V.indices), 0, Int(LD))
end
# Transform indices to be "dense"
_trimmedindex(i::Real) = oftype(i, 1)
_trimmedindex(i::AbstractUnitRange) = i
_trimmedindex(i::AbstractArray) = oftype(i, reshape(linearindices(i), axes(i)))

## SubArray creation
# We always assume that the dimensionality of the parent matches the number of
# indices that end up getting passed to it, so we store the parent as a
Expand Down Expand Up @@ -135,7 +128,7 @@ julia> A # Note A has changed even though we modified b
"""
function view(A::AbstractArray, I::Vararg{Any,N}) where {N}
@_inline_meta
J = to_indices(A, I)
J = map(i->unalias(A,i), to_indices(A, I))
@boundscheck checkbounds(A, J...)
unsafe_view(_maybe_reshape_parent(A, index_ndims(J...)), J...)
end
Expand Down
Loading

0 comments on commit f38a734

Please sign in to comment.