-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TArray Indexing Performance #83
Conversation
test/benchmarks.jl
Outdated
print("indexing: ") | ||
@btime $A[$x, $y] + $A[$x, $y] | ||
@btime @rep INTENSITY $A[$x, $y] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this interpolate the variables correctly in the @btime
macro? For accurate benchmarking it might be good to follow the standard suggestions and benchmark proper functions with arguments being interpolated, i.e., something like @btime f($x, $y)
(instead of @btime f(x, y)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it expands correctly, I checked the macro expand results. I changed them to the function calling form, the results didn't change.
I assume that performance of |
It decreases about 50% time usage 👍 |
I am not sure if it is useful for |
Results of the benchmarks: Time of Indexing (ns)
Time of Set-Indexing (ns)
Time of Broadcasting (ms)
|
I did some optimization for TArray{Float64, 2} indexing: move these arrays from task local storage to a field of CTask, which makes it type stable, and here's the benchmark results: Time of Indexing (ns)
Time of Set-Indexing (ns)
Indexing time consumption is reduced to 1/10. The caveat is that one must create a TArray{Float64,2} is a CTask, if one tries to create a TArray{Float64,2} in an ordinary task, an error will come out. |
It feels strange and somewhat not right to hard code a special case for |
I tried simply adding type assertions but it didn't bring any performance improvement:
Maybe I didn't use it correctly? I discussed with Hong and will try another approach: move the underlying Array to struct TArray. |
Moved TArray's underlying data from task local storage to TArray.data, here's the benchmark results: Time of Indexing (ns)
Time of Set-Indexing (ns)
Time of Broadcasting (ms)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add more comments and explanations, also to motivate specific design choices. Sometimes it is a bit difficult to read and understand the implementation.
src/tarray.jl
Outdated
orig_task :: Task | ||
TArray{T,N}() where {T,N} = new(gensym(), current_task()) | ||
data::Dict{Task, Tuple{Int, AbstractArray{T, N}}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data
should be concretely typed if possible, other get(data, task)
is not inferrable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I used Array{T, N}
at the beginning but not AbstractArray{N, T}
and the indexing performance was better than it is now, but if we use a concrete array type, we can't wrap all kinds of arrays into a TArray
, e.g. SubArray
returned from view
.
Using AbstractArray
brings a little performance decrease but not that bad I thought, and it is compatible with all kinds of arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had in mind something like
struct TArray{T,N,A<:AbstractArray{T,N}} <: AbstractArray{T,N}
orig::Task
data::Dict{Task,Tuple{Int,A}}
end
Could we do this? Or are there any problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I will have a try tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this works very well, see the latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
src/tarray.jl
Outdated
_get(x) = x | ||
function _get(x::TArray{T, N}) where {T, N} | ||
n, d = x.data[current_task()] | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be inferred either (see above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devmotion Just to clarify, do you mean the types of x::TArray
need to be parametric / more concrete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KDr2 _get
and _get_for_write
looks similiar to _local_storage
. If that's true, maybe merge these functions into _get_local_storage
/ _set_local_storage
for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, do you mean the types of x::TArray need to be parametric / more concrete?
Yes, I mean that the type of d = last(x.data[task])
can only be inferred if data::Dict{Task,Tuple{Int,V}}
is a field of TArray
with concrete value type V
that is e.g. a type parameter of TArray
.
Just to clarify, this is not related to the type parameters T
and N
and corresponding where clauses in the definition of _get
. They are not helpful for type inference.
New benchmark results after commit "underlying array type as a parametric type of TArray": Time of Indexing (ns)
Time of Set-Indexing (ns)
Time of Broadcasting (ms)
|
src/tarray.jl
Outdated
@@ -259,7 +283,7 @@ Base.:*(x::AbstractArray, y::TArray) = x * _get(y) |> localize | |||
Base.:*(x::TArray, y::AbstractArray) = _get(x) * y |> localize | |||
|
|||
# broadcast | |||
Base.BroadcastStyle(::Type{TArray{T, N}}) where {T, N} = Broadcast.ArrayStyle{TArray}() | |||
Base.BroadcastStyle(::Type{TArray{T, N, A}}) where {T, N, A} = Broadcast.ArrayStyle{TArray}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there some potential problems with dropping the type information here? It seems it is not used anyway in the implementation of broadcasted
so it should be fine, I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also curious about this: if I use ::Type{TArray}
or ::Type{TArray{T, N}}
, the broadcasting benchmark will consume about 10x as much time as using ::Type{TArray{T, N, A}}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it falls back to some more expensive implementation in Julia base if the type is not concrete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But initial question was more about ArrayStyle{TArray}
than ::Type{TArray{T,N,A}}
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ArrayStyle{TArray}
is just a singleton used to dispatch function calls to the propriety Broadcast.broadcasted
implementation, so I think it's OK to drop the parametric types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation for AbstractArray
s the preferred definition is
Base.BroadcastStyle(::Type{TArray{T, N, A}}) where {T, N, A} = Broadcast.ArrayStyle{TArray}() | |
Base.BroadcastStyle(::Type{<:TArray}) = Broadcast.ArrayStyle{TArray}() |
which is basically what you had above (and therefore should give similar performance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base.BroadcastStyle(::Type{<:TArray}) = Broadcast.ArrayStyle{TArray}()
I'd second the one from Julia documentation since it is more concise.
src/tarray.jl
Outdated
@@ -259,7 +283,7 @@ Base.:*(x::AbstractArray, y::TArray) = x * _get(y) |> localize | |||
Base.:*(x::TArray, y::AbstractArray) = _get(x) * y |> localize | |||
|
|||
# broadcast | |||
Base.BroadcastStyle(::Type{TArray{T, N}}) where {T, N} = Broadcast.ArrayStyle{TArray}() | |||
Base.BroadcastStyle(::Type{TArray{T, N, A}}) where {T, N, A} = Broadcast.ArrayStyle{TArray}() | |||
Broadcast.broadcasted(::Broadcast.ArrayStyle{TArray}, f, args...) = f.(_get.(args)...) |> localize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this method? Or, if the default fallback is not sufficient (I assume it's not), implement
Base.similar(bc::Broadcast.Broadcasted{Broadcast.ArrayStyle{TArray}}, ::Type{ElType}) where ElType
instead as suggested in the example in the documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this:
--- a/src/tarray.jl
+++ b/src/tarray.jl
@@ -46,6 +46,7 @@ TArray{T}(::UndefInitializer, dim::NTuple{N,Int}) where {T,N} = TArray(T, dim)
TArray{T,N}(d::Vararg{<:Integer,N}) where {T,N} = TArray(T, d)
TArray{T,N}(::UndefInitializer, d::Vararg{<:Integer,N}) where {T,N} = TArray{T,N}(d)
TArray{T,N}(dim::NTuple{N,Int}) where {T,N} = TArray(T, dim)
+TArray{T, N, Array{T, N}}(::UndefInitializer, dim::NTuple{N,Int}) where {T,N} = TArray(T, dim)
function TArray(T::Type, dim)
N_dim = length(dim)
@@ -283,8 +284,10 @@ Base.:*(x::AbstractArray, y::TArray) = x * _get(y) |> localize
Base.:*(x::TArray, y::AbstractArray) = _get(x) * y |> localize
# broadcast
-Base.BroadcastStyle(::Type{TArray{T, N, A}}) where {T, N, A} = Broadcast.ArrayStyle{TArray}()
-Broadcast.broadcasted(::Broadcast.ArrayStyle{TArray}, f, args...) = f.(_get.(args)...) |> localize
+Base.BroadcastStyle(::Type{TArray{T, N, A}}) where {T, N, A} = Broadcast.ArrayStyle{TArray{T, N, A}}()
+# Broadcast.broadcasted(::Broadcast.ArrayStyle{TArray}, f, args...) = f.(_get.(args)...) |> localize
+Base.similar(bc::Broadcast.Broadcasted{Broadcast.ArrayStyle{TArray{T, N, A}}}, ::Type{T}) where {T, N, A} =
+ similar(TArray{T, N, Array{T, N}}, axes(bc))
import LinearAlgebra
import LinearAlgebra: \, /, inv, det, logdet, logabsdet, norm
It became much slower (2.8 ms -> 192 ms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, @KDr2 and @devmotion!
About the benchmarks of
becomes
Because the compiler knows that if A is a built-in Array, only one expression is used (as the return value). I will try to find a way to prevent this optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @KDr2 - I've added some comments below. The PR looks good in general, however, there are a few places that can be improved for clarity/performance
src/tarray.jl
Outdated
_get(x) = x | ||
function _get(x::TArray{T, N}) where {T, N} | ||
n, d = x.data[current_task()] | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devmotion Just to clarify, do you mean the types of x::TArray
need to be parametric / more concrete?
src/tarray.jl
Outdated
n, d = x.data[current_task()] | ||
return d | ||
end | ||
function _get_for_write(x::TArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider unifying _get
and _get_for_write
, e.g. by adding a flag to force deepcopy?
src/tarray.jl
Outdated
_get(x) = x | ||
function _get(x::TArray{T, N}) where {T, N} | ||
n, d = x.data[current_task()] | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KDr2 _get
and _get_for_write
looks similiar to _local_storage
. If that's true, maybe merge these functions into _get_local_storage
/ _set_local_storage
for clarity?
src/tarray.jl
Outdated
@@ -259,7 +283,7 @@ Base.:*(x::AbstractArray, y::TArray) = x * _get(y) |> localize | |||
Base.:*(x::TArray, y::AbstractArray) = _get(x) * y |> localize | |||
|
|||
# broadcast | |||
Base.BroadcastStyle(::Type{TArray{T, N}}) where {T, N} = Broadcast.ArrayStyle{TArray}() | |||
Base.BroadcastStyle(::Type{TArray{T, N, A}}) where {T, N, A} = Broadcast.ArrayStyle{TArray}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base.BroadcastStyle(::Type{<:TArray}) = Broadcast.ArrayStyle{TArray}()
I'd second the one from Julia documentation since it is more concise.
src/tarray.jl
Outdated
@@ -259,7 +283,7 @@ Base.:*(x::AbstractArray, y::TArray) = x * _get(y) |> localize | |||
Base.:*(x::TArray, y::AbstractArray) = _get(x) * y |> localize | |||
|
|||
# broadcast | |||
Base.BroadcastStyle(::Type{TArray{T, N}}) where {T, N} = Broadcast.ArrayStyle{TArray}() | |||
Base.BroadcastStyle(::Type{TArray{T, N, A}}) where {T, N, A} = Broadcast.ArrayStyle{TArray}() | |||
Broadcast.broadcasted(::Broadcast.ArrayStyle{TArray}, f, args...) = f.(_get.(args)...) |> localize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, @KDr2 and @devmotion!
The latest commit:
|
Why did you change it? I would be fine with the implementation of Base.BroadcastStyle(::Type{<:TArray}) = Broadcast.ArrayStyle{TArray}() as shown in the documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - happy to merge once the broadcasting issue raised by @devmotion is addressed!
Sorry, I misunderstood, fixed now. |
Trying to benchmark, profile, and improve the performance of
TArray
indexing.