Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using @threads works, using @batch has no effect #138

Open
lampretl opened this issue Mar 21, 2024 · 4 comments
Open

Using @threads works, using @batch has no effect #138

lampretl opened this issue Mar 21, 2024 · 4 comments

Comments

@lampretl
Copy link

lampretl commented Mar 21, 2024

The following function

using LinearAlgebra, Polyester
function invMatU(X ::Matrix{tw}) ::Matrix{tw}  where {tw}
    m,n = size(X)
    Y = zeros(tw,m,n)
    _1 = one(tw)
    @inbounds Threads.@threads for j = n : -1 : 1  # backward-substitution; 
        if j≤m  
            Y[j,j] = inv(X[j,j]) 
        end
        @inbounds for i = min(j-1,m) : -1 : 1  
            Yij = X[i,j] * (j≤m ? Y[j,j] : _1)
            @inbounds for k = i+1 : min(j-1,m)    
                Yij += X[i,k]*Y[k,j] 
            end
            if i≤n  
                Y[i,j] = -Yij/X[i,i] 
            end 
        end 
    end
    return Y 
end;

computes the inverse of an upper-triangular matrix. If X is not square, it is assumed that it represents a larger square matrix, by extending (hcator vcat) by the identity matrix. We can see it works correctly, since

m,n=5,10; tw=Int; 
X=rand(tw.(0:10),m,n);  
triu!(X); 
for i=1:min(m,n)  X[i,i] = rand([-1,1]) end
@time Xi=invMatU(X);  
X_,Xi_=(vcat(t,Matrix{tw}(I,n,n)[m+1:end,:]) for t∈(X,Xi));  
display.((X,Xi,X_,Xi_,X_*Xi_==I));

returns

5×10 Matrix{Int64}:
 -1   2  0   1   9  5  4   7   7   1
  0  -1  0   6   6  2  2   0  10   1
  0   0  1  10   5  3  2  10   3   3
  0   0  0   1   1  3  9   0   8   7
  0   0  0   0  -1  0  5   8   7  10
5×10 Matrix{Int64}:
 -1  -2  0   13  -8  -30  -69  71  -21   -8
  0  -1  0    6   0  -16  -52   0  -38  -41
  0   0  1  -10  -5   27  113  30  112  117
  0   0  0    1   1   -3  -14  -8  -15  -17
  0   0  0    0  -1    0    5   8    7   10
10×10 Matrix{Int64}:
 -1   2  0   1   9  5  4   7   7   1
  0  -1  0   6   6  2  2   0  10   1
  0   0  1  10   5  3  2  10   3   3
  0   0  0   1   1  3  9   0   8   7
  0   0  0   0  -1  0  5   8   7  10
  0   0  0   0   0  1  0   0   0   0
  0   0  0   0   0  0  1   0   0   0
  0   0  0   0   0  0  0   1   0   0
  0   0  0   0   0  0  0   0   1   0
  0   0  0   0   0  0  0   0   0   1
10×10 Matrix{Int64}:
 -1  -2  0   13  -8  -30  -69  71  -21   -8
  0  -1  0    6   0  -16  -52   0  -38  -41
  0   0  1  -10  -5   27  113  30  112  117
  0   0  0    1   1   -3  -14  -8  -15  -17
  0   0  0    0  -1    0    5   8    7   10
  0   0  0    0   0    1    0   0    0    0
  0   0  0    0   0    0    1   0    0    0
  0   0  0    0   0    0    0   1    0    0
  0   0  0    0   0    0    0   0    1    0
  0   0  0    0   0    0    0   0    0    1
true

However, replacing Threads.@threads by Polyester.@batch results in matrix Xi being zero. So something isn't right here.

Btw, this was run with Polyester v0.7.10 on Julia 1.9.2 in Linux Mint 20.

@chriselrod
Copy link
Member

Could you try it with

@batch _j = 0:n-1
  j = n - _j
  # rest of code
end

?

@lampretl
Copy link
Author

Interesting. With your suggested edit, it works correctly.

Note also that with my original code, I occasionally get an error:

ERROR: LoadError: BoundsError: attempt to access 5×10 StrideArraysCore.PtrArray{Int64, 2, (1, 2), Tuple{Int64, Int64}, Tuple{Nothing, Nothing}, Tuple{Static.StaticInt{1}, Static.StaticInt{1}}} at index [0, 0]
Stacktrace:
  [1] throw_boundserror(A::StrideArraysCore.PtrArray{Int64, 2, (1, 2), Tuple{Int64, Int64}, Tuple{Nothing, Nothing}, Tuple{Static.StaticInt{1}, Static.StaticInt{1}}}, I::Tuple{Int64, Int64})
    @ Base ./abstractarray.jl:744
  [2] checkbounds
    @ ./abstractarray.jl:709 [inlined]
  [3] getindex
    @ ~/.julia/packages/StrideArraysCore/VyBzA/src/ptr_array.jl:935 [inlined]
  [4] macro expansion
    @ ~/Desktop/tmp.jl:10 [inlined]
  [5] #13
    @ ~/.julia/packages/Polyester/wJbZW/src/closure.jl:281 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/Polyester/wJbZW/src/batch.jl:255 [inlined]
  [7] _batch_no_reserve
    @ ~/.julia/packages/Polyester/wJbZW/src/batch.jl:168 [inlined]
  [8] batch
    @ ~/.julia/packages/Polyester/wJbZW/src/batch.jl:343 [inlined]
  [9] macro expansion
    @ ~/.julia/packages/Polyester/wJbZW/src/closure.jl:426 [inlined]
 [10] invMatU(X::Matrix{Int64})
    @ Main ~/Desktop/tmp.jl:8
 [11] top-level scope
    @ ./timing.jl:273
 [12] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [13] top-level scope
    @ REPL[2]:1
in expression starting at /home/leon/Desktop/tmp.jl:29

@chriselrod
Copy link
Member

Not handling step ranges with negative steps is a bug.
But why use negative steps?

Threading means there is no guaranteed order, so it isn't iterating backwards. Just use for j in 1:n.

@lampretl
Copy link
Author

lampretl commented Mar 22, 2024

Hah, you're right, reverse iteration is unnecessary anyway. In the context of multithreading, it doesn't even make sense, as you pointed out. However, if we computed in-place (gradually replacing X by its inverse), then we'd have to be doing it as j=n:-1:1, and parallelizing the function wouldn't be an option.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants