-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Facilitate not paying double lookup cost when mutating a value in a LinearSlow AbstractArray #15630
Comments
This isn't unique to arrays: #2108. The non-scalar case is also interesting, since |
It's definitely feasible, and obviously the scalar case is easier than the non-scalar case because you don't (usually?) have to worry about aliasing, which IIUC is the hardest part about automatic devectorization. For types where julia> B = rand(6,7);
julia> A = sub(B, 1:5, 1:5);
julia> foo!(A, i, j, v) = (@inbounds A[i,j] = A[i,j] + v; A)
foo! (generic function with 1 method)
julia> @code_native foo!(A, 2, 3, -20)
.text
Filename: none
Source line: 0
pushq %rbp
movq %rsp, %rbp
movq $0, -32(%rbp)
movq $0, -24(%rbp)
movq $0, -16(%rbp)
movq $0, -8(%rbp)
movq $14, -72(%rbp)
movabsq $jl_tls_states, %r8
movq (%r8), %rax
movq %rax, -64(%rbp)
leaq -72(%rbp), %rax
movq %rax, (%r8)
Source line: 145
movq %rdi, -56(%rbp)
movq 24(%rdi), %r9
Source line: 419
addq 8(%rdi), %rsi
movq %rdi, -48(%rbp)
movq (%rdi), %rax
movq %rax, -40(%rbp)
leaq -2(%rdx,%r9), %rdx
imulq 24(%rax), %rdx
addq %rsi, %rdx
movq (%rax), %rsi
vmovsd -16(%rsi,%rdx,8), %xmm0 # xmm0 = mem[0],zero
Source line: 146
vcvtsi2sdq %rcx, %xmm0, %xmm1
vaddsd %xmm0, %xmm1, %xmm0
Source line: 178
movq %rdi, -32(%rbp)
Source line: 419
movq %rdi, -24(%rbp)
movq %rax, -16(%rbp)
vmovsd %xmm0, -16(%rsi,%rdx,8)
Source line: 179
movq %rdi, -8(%rbp)
movq -64(%rbp), %rax
movq %rax, (%r8)
movq %rdi, %rax
popq %rbp
retq
nopl (%rax) Obviously, for some array types ( Another potential way to achieve the same thing is through #15434 and whatever #15459 evolves into: if each array gets a custom iterator that provides trivial (i.e., inline-worthy) access to the data, then access cost is greatly reduced. My main concern is in figuring out how to do #15459 without uglifying everything. Maybe we can continue this conversation in a couple of days? I'm hoping to finish up a julep/blog post/whatever that goes more completely into my thoughts (which unfortunately are still vague and somewhat short when it comes to actual solutions). |
Changing the lowering of Another alternative would be to have an idiom for converting arbitrary integer indices into efficient pre-looked-up objects that support indexing. It'd be like a conversion utility to go from tuples of ints to the fast iterators. And that's akin to #12157. |
A natural extension to |
The discussion in #249 seemed to reach the decision that different behaviors for However I believe there should be some way to efficiently update an index. The only way I currently now is by copy pasting the |
This button... |
Yeah, lets leave the scalar You could still pre-compute a fast index, which for SparseMatrices could be an immutable that keeps track of both the cartesian indices and the index into I = fastindex(K, glob1, glob2)
K[I] += Ke[loc1, loc2] |
I've been pondering on this a bit more while dogfooding my own array package: https://github.com/KristofferC/BlockArrays.jl and came to the conclusion that being able to dispatch on As an example, let's say I have a 2x2 blockmatrix in the format of something I call a julia> using BlockArrays
julia> A = PseudoBlockArray(zeros(4,4), [2,2], [2,2])
2×2-blocked 4×4 BlockArrays.PseudoBlockArray{Float64,2,Array{Float64,2}}:
0.0 0.0 │ 0.0 0.0
0.0 0.0 │ 0.0 0.0
----------┼----------
0.0 0.0 │ 0.0 0.0
0.0 0.0 │ 0.0 0.0 ```
Now, I want to add soemthing to the top right block:
```jl
julia> A[Block(1,2)] += ones(2,2); A
2×2-blocked 4×4 BlockArrays.PseudoBlockArray{Float64,2,Array{Float64,2}}:
0.0 0.0 │ 1.0 1.0
0.0 0.0 │ 1.0 1.0
----------┼----------
0.0 0.0 │ 0.0 0.0
0.0 0.0 │ 0.0 0.0 The problem here is that |
I made a small package here that uses a macro to achieve what I want: https://github.com/KristofferC/UpdateIndex.jl |
This would be nice to have. I am not sure if this would have to be breaking, and/or if something is needed to do pre 1.0 in order to implement it after? |
Well, would like to bump this - it really is a performance issue in particular for 3D FE computations, and we see custom solutions spreading over various packages, see e.g. @KristofferC 's and mine ... |
FWIW, for FE assembly I found it in general faster to just sort the degrees of freedom that you are going to assemble and then for each column in the sparse array just pass through the rows linearly until you have matched the correct number of times: This could be slower if you have elements with a huge number of degrees of freedom but for my cases, it has been quite fast compared to doing the binary search over and over. |
Yeah I am quite sure this is faster. OTOH this means you need to separate pattern creation and assembly. A priori pattern creation becomes quite messy for strongly coupled nonlinear PDEs , so I wanted to have a matrix struct where just saying But your remark triggers some thinking.... |
Yes, that is a good observation. |
Is there a significant issue with the |
I think Referenceables.jl API solves this problem. It's something like this: using Referenceables: referenceable
# Convert an Array-of-T `A` to an Array-of-Ref{T} `B`
B = referenceable(A)
# Get Ref{T} by indexing into `B`. The conversion to fast "intrinsic"
# indexing can be done at this point.
ref = @inbounds B[i ,j]
# Mutating `ref` would mutate `B[i, j]` without paying the
# Cartesian-to-linear conversion cost. (Bonus: no `@inbounds`)
ref[] += inc As an extra benefit, this idea works very well with parallel |
Ref #24454 |
Yes, |
There are many cases where one wants to modify a value in an array like
A[i, j] = f(A[i, j])
, IfA
is aLinearSlow
array this would currently need two transformation between the Cartesian index and the linear index, one ingetindex
and one insetindex!
.A typical real world example is in Finite Element Analysis where one wants to "assemble" a local small dense matrix
Ke
into a large sparse matrixK
Since
K[i,j] += Ke[a,b]
lowers toK[i,j] = K[i,j] + K[a,b]
we will do the Cartesian to linear index searches twice. Typically in Finite Element codes to avoid this, one usually creates aadd(K, v, i::Integer, j::Integer)
method onesub(K, v, i::Integer, j::Integer)
etc, see for example https://github.com/dealii/dealii/blob/0cb18a1aba58177c59bfe3755d13e67cc6099244/examples/step-5/step-5.cc#L391.It could perhaps be useful for all
AbstractArrays
to optionally support a fast version ofA[i, j] = f(A[i, j])
, lets call itmutateindex!
mutateindex!
would have the signaturemutateindex(A, f, i...)
wheref
is a function(v) -> f(v)
wherev
is the value ofA[i...]
A default fall back for
mutateindex
would be simple:Optionally an
AbstractArray
could implement a fast version ofmutateindex
, like for a SparseMatrix:We could now write our assembly function again as:
and by doing some benchmarks:
we see that we are now about twice as fast due to halving the dominating lookup cost.
It would of course be nice to have some good syntax for this, which ties on to #249.
We could also just have
setindex!(A, v, i, j)
be defined as `mutateindex!(A, (_) -> v, i...) but that is probably too disruptive.Maybe our
AbstractArray
Super Saiyans @timholy and @mbauman has any comments on the feasibility of something like this.The text was updated successfully, but these errors were encountered: