-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manually deconstruct and reconstruct subarray when throwing boundserror #29867
Conversation
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of stupid, yes, but also admirably clever 😈. Performancewise it's a nice improvement in a well-benchmarked area.
And we'll always remember %10
.
This change introduces a julia> function f(dest)
dest[1] = 1
nothing
end
kernel (generic function with 1 method)
julia> tt = Tuple{SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}}
Tuple{SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}}
julia> code_llvm(f, tt)
...
%68 = call nonnull %jl_value_t addrspace(10)* @jl_invoke(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 139861354488080 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %3, i32 7)
call void @llvm.trap()
unreachable
... I'm not sure this is wanted, both from a GPU-compatibility POV as well as not to introduce an indirect call for SubArray indexing. Revert to have 1.1 work with CuArrays? Or any suggested improvements? |
It is difficult to program against an unknown, non CI-enabled, subset of the language. While the performance improvements in this PR are on their own probably not very important, this PR + #29868 makes quite a bit of difference for broadcasted updates to small arrays. I guess we can revert here but the current situation is very much non-ideal.
The indirect call is only when we are throwing, so do we care (modulo GPU codegen)? |
Sure, but that's a whole different problem (where to host, which package versions to test against, etc) and not relevant here. I'm doing my best to ensure compatibility, hence my comment. Since the regular |
Isn't it relevant here since the whole reason there is a problem here is that the code generated here does not adhere to the GPU subset?
The difficult part is that (AFAIU) the reason the SubArray manages to get optimized away is due to the indirect call.
Does the code you linked really have anything to do with the change here?
That code is there on 1.0.3 as well. I guess it is the
that is the problem? |
Sure, I meant the difficulty to set-up CI. I also wouldn't expect Base to adhere to that subset right away since it often motivates me to implement some features 🙂 (case in point, JuliaGPU/CUDAnative.jl#314 was motivated by this issue). And yes I copied the wrong snippet, it's the invoke indeed.
Does that need a |
I think that is correct. I am not sure either why an |
I think this works around it: diff --git a/base/subarray.jl b/base/subarray.jl
index 2d6f8a709a..8d133b7e1e 100644
--- a/base/subarray.jl
+++ b/base/subarray.jl
@@ -42,7 +42,7 @@ check_parent_index_match(parent, ::NTuple{N, Bool}) where {N} =
# are inlined
@inline Base.throw_boundserror(A::SubArray, I) =
__subarray_throw_boundserror(typeof(A), A.parent, A.indices, A.offset1, A.stride1, I)
-@noinline __subarray_throw_boundserror(T, parent, indices, offset1, stride1, I) =
+@noinline __subarray_throw_boundserror(::Type{T}, parent, indices, offset1, stride1, I) where {T} =
throw(BoundsError(T(parent, indices, offset1, stride1), I))
# This computes the linear indexing compatibility for a given tuple of indices Could you check. For me a
is now emitted. |
Yeah that seems great. I'll create a PR. |
Improves #29867 by avoiding an invoke.
Improves #29867 by avoiding an invoke.
This is perhaps stupid but anyway.
Before
After
IR Before
IR After
So long dear
%10
.Fixes #29860