-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite recursion of \
on sparse matices
#283
Comments
Bisected to 87e26c516853dd395de3c50406076f2d4d7aa413. @andreasnoack
|
Even though this is bisected to a commit that change the definition of |
Yes. I don't think there is anything wrong with the method. |
I think there are multiple problems here.
I think all three problems needs to be fixed. For this particular case (the WoodburyMatrices test case), |
And to demonstrate point 3 julia> U = sprandn(5, 2, 0.2)
5x2 sparse matrix with 1 Float64 entries:
[3, 1] = 1.14495
julia> A2 = UpperTriangular([1. 2. 3. 4. 5.
0 1. 0. 0. 0.
0 0 1. 0. 0.
0 0 0 1. 0.
0 0 0 0 1.])
5x5 UpperTriangular{Float64,Array{Float64,2}}:
1.0 2.0 3.0 4.0 5.0
0.0 1.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0
julia> A2 \ U
ERROR: StackOverflowError:
in \ at ./linalg/generic.jl:329 edit: this also happens on 0.4 |
\
on sparse matices
I think this particular issue will be fixed when the necessary specialization is provided (since type inference will dispatch it to other methods instead of infinitely recursing on the same method with different types). I've updated the issue and will open another issue for the general type inference problem. (edit JuliaLang/julia#14009) |
Also note that when type inference is fixed to deal with these, the inferened type for the current definition would likely be |
In 0.5 dev: A2\U -> infinite loop |
Thanks for analyzing the issues here. We should probably define a fall back for sparse right hand side with julia> Bidiagonal(ones(5), ones(4), false)\eye(5,1)
5x1 Array{Float64,2}:
1.0
-1.0
1.0
-1.0
1.0 However, there could easily be other situations where some combination of |
Unfortunately no. since the type inference is constructing |
Can there be something like |
So do we have to break the chain here as well? I.e.
We could fall back to |
LOL, I like this. A less surprising way (for the type system) could be |
Not that I'm aware of. Most people probably expect a constructor to construct an instance of the type that shares name with the constructor. I've never really thought about this issue with nested special matrices before, but it's an instance of constructors not being generic which can explain the need for function like |
Test data and analysis for TL;DR The fix Andreas proposed for some aspects of this issue, at the REPL import Base.(\)
import Base.LinAlg.AbstractTriangular
(\)(A::AbstractTriangular, B::AbstractVecOrMat) = A \ convert(Array, B) patches up the mode of failure described above in all cases tested below. In some cases eliminating this first mode of failure reveals an unrelated second mode of failure, namely that Shall I construct a PR implementing the fix, removing the methods it obviates, and introducing tests covering the fixed cases? Additionally, should I open new issues for the various modes of Setup code: order = 5
varscale = 2 * order
fillprob = 0.2
perturbationvec(order) = randn(order) / varscale;
perturbationmat(order) = randn(order, order) / varscale;
spperturbationmat(order) = sprandn(order, order, fillprob) / varscale;
diagvec = perturbationvec(order) + ones(order);
subdiagvec = perturbationvec(order - 1);
superdiagvec = perturbationvec(order - 1);
diagmat = Diagonal(diagvec);
lbidiagmat = Bidiagonal(diagvec, subdiagvec, false);
ubidiagmat = Bidiagonal(diagvec, superdiagvec, true);
tridiagmat = Tridiagonal(subdiagvec, diagvec, superdiagvec);
symtridiagmat = SymTridiagonal(diagvec, superdiagvec);
densemat = eye(order) + perturbationmat(order);
ltdensemat = LowerTriangular(densemat);
utdensemat = UpperTriangular(densemat);
symdensemat = Symmetric(densemat);
hrmdensemat = Hermitian(densemat);
sparsemat = speye(order) + spperturbationmat(order);
ltsparsemat = LowerTriangular(sparsemat);
utsparsemat = UpperTriangular(sparsemat);
symsparsemat = Symmetric(sparsemat);
hrmsparsemat = Hermitian(sparsemat);
cscvector = sparse([1], [1], [0.1], order, 1);
sparsevector = sparsevec([1], [1.0], order); Tests and analysis: # Common failure mode 1: Dispatches to method (\)(AbstractMatrix, AbstractVecOrMat),
# which begins one form or another of infinite loop as described earlier in this issue.
#
# Common failure mode 2: Dispatches to method (\)(A::AbstractMatrix, B::AbstractVecOrMat),
# which correctly calls `lufact(A) \ B`, which subsequently fails for lack of appropriate
# methods.
# Without fix, success: Dispatches to method (\)(Diag, AbstractMatrix), which works.
diagmat \ cscvector;
diagmat \ sparsevector;
# With fix, still succeeds.
# Without fix, success: Dispatches to method (\)(Bidiagonal, AbstractMatrix), which works.
lbidiagmat \ cscvector;
ubidiagmat \ cscvector;
lbidiagmat \ sparsevector;
ubidiagmat \ sparsevector;
# With fix, still succeeds.
# Without fix, failure: Common failure mode 1.
tridiagmat \ cscvector;
tridiagmat \ sparsevector;
# With fix, succeeds.
# Without fix, failure: Common failure mode 2. Specifically, lufact(SymTridiagonal)
# calls lufact!(SymTridiagonal), but the latter has no method for SymTridiagonal.
# Hence this actually occurs independent of RHS. So we have another issue:
# Fix (\)(SymTridiagonal, X) where X<:AbstractVecOrMat.
symtridiagmat \ cscvector;
symtridiagmat \ sparsevector;
# With fix, still fails as above.
# Without fix, failure: Common failure mode 1.
densemat \ cscvector;
densemat \ sparsevector;
# With fix, changes to failure mode 2. Specifically, ultimately
# A_ldiv_B!(LU, SparseMatrixCSC) and A_ldiv_B!(LU, SparseVector)
# get called, but no such methods exist.
# Without fix, failure: Common failure mode 1.
ltdensemat \ cscvector;
ltdensemat \ sparsevector;
utdensemat \ cscvector;
utdensemat \ sparsevector;
# With fix, success.
# Without fix, failure: Common failure mode 1.
symdensemat \ cscvector;
symdensemat \ sparesvector;
hrmdensemat \ cscvector;
hrmdensemat \ sparsevector;
# With fix, changes to failure mode 2. Specifically, lufact(A) calls
# lufact!(A), but the latter has no methods for Symmetric{Array}
# and Hermitian{Array}.
# Without fix, failure: Common failure mode 2. Specifically,
# ultimately A_ldiv_B!(UmfpackLU, SparseMatrixCSC) gets called,
# but no such method exists.
sparsemat \ cscvector;
# With fix, still fails as above.
# Without fix, failure: Common failure mode 1.
sparsemat \ sparsevector;
# With fix, changes to failure mode 2. Specifically, ultimately
# A_ldiv_B!(UmfpackLU, SparseVector) gets called,
# but no such method exists.
# Without fix, success: Dispatches to methods
# (\)(LowerTriangular{SparseMatrixCSC}, SparseMatrixCSC) and
# (\)(UpperTriangular{SparseMatrixCSC}, SparseMatrixCSC), which work.
ltsparsemat \ cscvector;
utsparsemat \ cscvector;
# With fix, still succeeds, but the fix obviates the specialized methods
# mentioned just above, given they simply call, e.g.,
# (\)(LowerTriangular{SparseMatrixCSC}, full(SparseMatrixCSC)),
# which is what the fix does more generally.
# Without fix, failure: Common failure mode 1.
ltsparsemat \ sparsevector;
utsparsemat \ sparsevector;
# With fix, success.
# Without fix, failure: Common failure mode 1.
symsparsemat \ cscvector;
symsparsemat \ sparsevector;
hrmsparsemat \ cscvector;
hrmsparsemat \ sparsevector;
# With fix, changes to failure mode 2. Specifically,
# lufact(A) calls lufact!(A), but the latter has no methods
# for Symmetric{SparseMatrixCSC} and Hermitian{SparseMatrixCSC}. Edit: Corrected tests. |
@Sacha0 Thanks for going through this systematically
Yes please do. Since the other issues are hidden by the first issue, it might be easier to discuss the them in the new PR instead of in a separate issue. We can open a separate issue later if necessary. |
Working on the pull request discussed above sent me down a surprisingly deep rabbit hole (and in exploring that warren I have been amazed by all the brilliant work the Julia team has done --- hats off!) With new perspective, the approach proposed above seems suboptimal with respect to #136. The alternative approach I am considering involves more substantial modifications, so being a novice it struck me as wise to seek input prior to forging ahead. My thoughts follow below; please advise. In short: linalg/triangular.jl contains great generic infrastructure. Rather than punting presently-unhandled cases to dense methods, we should make better use of that generic infrastructure and extend it where necessary. In full: linalg/triangular.jl and linalg/bidiag.jl together appear to contain all generic methods necessary to handle several binary operations (various forms of multiplication and left and right division) where one argument is some form of triangular type and the other is a one- or two-dimensional array type that implements There appears to be no simple/graceful way to precisely capture the interface described above within the existing type system (short of using traits). If so (and prior to a proper traits/interfaces system or equivalent enhancement in expressive power of the type system landing), it seems better to err on the side of making the generic method definitions in linalg/triangular.jl and linalg/bidiag.jl slightly loose ( Basically another face of JuliaLang/julia#1276. This side of attempting implementation I see two primary potential downsides. First, chances are some ambiguity warnings will have a good laugh at my expense --- not a significant concern. Second, barring introduction of methods presently being discussed in #271 and touched on in JuliaLang/julia#12441 (or using traits), there exists a class of cases this approach will not handle, namely those where the non-triangular argument's type does not support The generic promotion methods defined in 4d0b1ac9c8bc50fe15bdfee29e8fa86ee8809f1e/base/linalg/triangular.jl#L986-L1040: take a pair of array-like arguments; determine the element type of the returned array from the operation and argument types; accordingly promote the arguments' element types to ensure type stability in the subsequent calculation; and finally pass the element-promoted arguments to an appropriate in-place method. Specifically, the argument passed in mutating position to the in-place method is promoted/copied via At the same time it might be nice to reduce the minor coupling between linalg/triangular.jl and linalg/bidiag.jl. Said coupling somewhat obfuscates how the generic methods tie together in linalg/triangular.jl. Thoughts? Should I move forward with this approach? Best! |
I like the sound of where it seems you're going here. |
Cheers --- absent further input I'll start playing with this midweek. Best! |
@Sacha0 I think you are right about most of this. It is possible to fix most in this issue by speical casing sparse right hand side and promoting to dense. That is reasonsable to do, but is not as generic as you (and I) might want. Your proposed solution with |
@andreasnoack I share your concern over the performance of sparse and distributed arrays and appreciate the related thoughts you expressed in #136; writing specialized methods where performance is a significant issue is critical, and including warnings along the lines of The three potential changes at hand (generalization of existing generic methods via signature abstraction, generalization of existing generic methods via enhancement of similar/copy, and addition of specialized methods where performance is an issue or to cover presently uncovered cases) strike me as strongly complementary rather than mutually exclusive though (happily their combination addresses your concerns while simultaneously advancing #136), and with relative importance reflected in the preceding ordering --- first write generic methods ensuring things work, then add specialized methods where necessary for performance or other advantage. Beyond the immediately preceding tenet seeming to be the Julian way --- correct me if not --- doing otherwise has the ring of premature optimization. Expansion: JuliaLang/julia#1276 addresses precisely the question of AbstractArray vs StridedArray for generic methods; though it opens with an essentially identical argument to yours above concerning the performance of sparse and distributed arrays, tshort's counterarguments in favor of removing unnecessary constraint (JuliaLang/julia#1276 (comment) and JuliaLang/julia#1276 (comment)) eventually win out. stevengj, timholy, and others similarly argue against unnecessary constraint in JuliaLang/julia#2345 with like outcome. At an earlier stage, carlobaldassi also argues similarly in https://groups.google.com/forum/?fromgroups#!topic/julia-dev/0pcb-S7uOUw / JuliaLang/julia#750; though that early thread concludes against abstraction, that outcome is explicitly reversed in JuliaLang/julia#6258 and the position in favor of abstraction is affirmed. (Specifically JuliaLang/julia#6258 (comment), JuliaLang/julia#6258 (comment), and the pursuing comments. These comments seem to capture the position that fell out after the dust settled in JuliaLang/julia#987, JuliaLang/julia#5810, and JuliaLang/julia#6212 etc, followed by jiahao's codification in JuliaLang/julia#10064.) Insofar as I have seen that's the most recent directly related discussion / position. Furthermore, slow operation within reasonable bounds (a la JuliaLang/julia#6258 (comment)) seems preferable to failure, as perhaps evidenced by @stevengj's #136 and, for example, python's ubiquity. (Incidentally, encountering a sequence of failing operations involving sparse objects brought my attention to this issue.) Additionally, artificially restricting what operations one can perform seems at odds with Julia's 'consenting adults' principle --- particularly where that restriction's purpose is to paternalistically herd users away from relatively obvious performance pitfalls at the potential cost of significant, legitimate generic functionality. Where writing specialized methods cannot mitigate the performance pitfall, warnings like More concretely, the alternative approach --- So, concretely, in an effort to address all concerns above to at least lowest order :), I propose (1) generalizing the existing generic methods via signature abstraction; (2) writing corresponding specialized methods for sparse operands where important for performance; (3) writing corresponding 'warning' methods for distributed operands which caution against use; and (4) once #271 resolves, leveraging the resulting functionality and (1) to cover additional presently-failing cases. All that said, if this appeal does not sway you I will happily defer to your experience :) --- good conscience demanded that I make this appeal, but I will gladly confine my enthusiasm to the specialized methods for now and get coding :). |
Good summary. It's is maybe worth remembering that these discussions have taken place over a longer period when the language was very young and that they have involved different people over time. I can support (1), (2), and (4), but I consider (2) the most time critical because I suspect that most users will be affected. I'm more sceptical about (3). We generally move away from warnings. One of the reasons is that, right now, they cannot be caught so you cannot get rid of them even when you know what you are doing. It might be better to wait and see. Julia users are enthusiastic profilers so if a common method shows up as very slow, we'll see an issue and then we can add a specialized method. |
I failed to make clear a subtle but compelling point here, so explicitly: One might argue that in calling, for example, But this would be giving you (those who wrote the methods for generic numerical linear algebra, sparse matrices, and sparse vectors) too little credit. You did a really great job with those methods --- perhaps better than you realize: Whereas the dense-fallback approach necessarily destroys all structure, the signature-abstraction approach is already automatically structure-preserving in important cases, a tremendous advantage: Consider calling julia> m = 100; ltdensemat = LowerTriangular(eye(m) + randn(m, m)/(2*m));
julia> spvec = sparsevec([m], [1.0])
Sparse vector of length 100, with 1 Float64 nonzero entries:
[100] = 1.0
julia> A_ldiv_B!(ltdensemat, sparsevector)
Sparse vector of length 100, with 1 Float64 nonzero entries:
[100] = 0.990708
julia> cscvec = sparse([m], [1], [1.0])
100x1 sparse matrix with 1 Float64 entries:
[100, 1] = 1.0
julia> A_ldiv_B!(ltdensemat, cscvector)
100x1 sparse matrix with 1 Float64 entries:
[100, 1] = 0.990708
julia> cscmat = sparse(m*ones(Int, 3), [1:3], 1.0)
100x3 sparse matrix with 3 Float64 entries:
[100, 1] = 1.0
[100, 2] = 1.0
[100, 3] = 1.0
julia> A_ldiv_B!(ltdensemat, cscmat)
100x3 sparse matrix with 3 Float64 entries:
[100, 1] = 0.994819
[100, 2] = 0.994819
[100, 3] = 0.994819 That's fantastic. Nicely done :). |
Sounds good! I will prioritize (2) followed by (1), toss out (3), and keep an eye on #271 for (4). Let me know if I have missed any significant reading as well (I'm sure there's something!) Best! |
I like that, in general, you can rely on the julia> n = 500;
julia> A = randn(n, n)|> t -> t't;
julia> F = cholfact(A);
julia> b = sparsevec([n], [1.0], n);
julia> @time A_ldiv_B!(F[:U], copy(b));
0.011692 seconds (29 allocations: 17.031 KB)
julia> @time A_ldiv_B!(F[:U], full(b));
0.000364 seconds (10 allocations: 4.344 KB)
julia> 0.011692/0.000364
32.12087912087912 so for |
Cheers! With function foo_A_ldiv_B(A::LowerTriangular, B::SparseVector)
A_ldiv_B!(A, full(B))
end
function bar_A_ldiv_B(A::LowerTriangular, B::SparseVector)
nzrange = B.nzind[1]:B.n
nzrangesubA = LowerTriangular(sub(A.data, nzrange, nzrange))
SparseVector(B.n, collect(nzrange), A_ldiv_B!(nzrangesubA, full(B[nzrange])))
end
# warmup
wum = 10
wumat = LowerTriangular(rand(wum, wum))
wuvec = sparsevec([1], [1.0], wum)
foo_A_ldiv_B(wumat, wuvec);
bar_A_ldiv_B(wumat, wuvec);
m = 10000;
ltdensemat = LowerTriangular(eye(m) + randn(m, m)/(2*m));
diabolical = sparsevec([1], [1.0], m);
print("diabolical/foo: "); @time foo_A_ldiv_B(ltdensemat, diabolical);
print("diabolical/bar: "); @time bar_A_ldiv_B(ltdensemat, diabolical);
intermediate = sparsevec([Int(m/2)], [1.0], m);
print("intermediate/foo: "); @time foo_A_ldiv_B(ltdensemat, intermediate);
print("intermediate/bar: "); @time bar_A_ldiv_B(ltdensemat, intermediate);
wonderful = sparsevec([m], [1.0], m);
print("wonderful/foo: "); @time foo_A_ldiv_B(ltdensemat, wonderful);
print("wonderful/bar: "); @time bar_A_ldiv_B(ltdensemat, wonderful); This yields: diabolical/foo: 0.076005 seconds (7 allocations: 78.422 KB)
diabolical/bar: 0.076405 seconds (16 allocations: 156.953 KB)
intermediate/foo: 0.077598 seconds (7 allocations: 78.422 KB)
intermediate/bar: 0.018971 seconds (17 allocations: 78.875 KB)
wonderful/foo: 0.076400 seconds (7 allocations: 78.422 KB)
wonderful/bar: 0.000010 seconds (15 allocations: 768 bytes) Writing this: function baz_A_ldiv_B(A::LowerTriangular, B::SparseVector)
nzrange = B.nzind[1]:B.n
nzrangesubB = sub(B, nzrange)
nzrangesubA = sub(A, nzrange, nzrange)
SparseVector(B.n, collect(nzrange), A_ldiv_B!(nzrangesubA, full(nzrangesubB)))
end seems cleaner and more consistent than Edit: I should add that I see sparse results from solves not irregularly in my (admittedly niche) work, though specifically with sparse LHSs. |
Good observation. I'm still sceptical about returning a sparse vector from a the solve as the general solution. In many cases that would be a waste. However, since It wouldn't be good to have |
A more specialized alternative to |
That's true and in theory it could work with all the special matrix types. It would already work for |
Good catch! I missed the potential type instability. So the appropriate way to express my intent was indeed
Did I miss something here as well, or is this one an actual issue? Best! |
I like this. Do I understand XDiagonal correctly as Diagonal, Bidiagonal, Tridiagonal, and SymTridiagonal --- the special matrices that are separate storage classes rather than simply annotations of other storage classes' contents? |
sub on a sparsevector does sound like a not-implemented-yet issue. worth tracking. |
Good call --- something similar to
I must be missing something here. In |
The problem is that the returned vector is a function \(A::LowerTriangular, B::SparseVector)
nzrange = B.nzind[1]:B.n
nzrangesubA = LowerTriangular(sub(A.data, nzrange, nzrange))
Bf = full(B)
A_ldiv_B!(nzrangesubA, sub(Bf, nzrange))
return Bf
end |
Cheers, that looks great --- I will run with that for |
To check my inferred understanding of the design philosophy (so hopefully I can begin answering my own questions :) ): If a user calls If a user calls
which potentially trades some structure for speed, is almost always the best thing to do in this second case. Finally, if the user calls Does this capture your mental model? Thanks, and best! |
I'm a bit in doubt about the last case. I fear that many users unexpectedly will experience slow solves. Furthermore, users who know that they want a sparse result are probably more experienced and can maybe figure out that they could use |
Seems to be fixed according to travis. |
Update: There are multiple problems that cause the original issue. Use this issue to track the most directly related linalg issue. See #283 for the repro. This causes a regression on 0.5 but the same issue is also present on 0.4 in different forms.
Original post,
This is causing WoodburyMatrices.jl tests to freeze.
minimum repro:
code_warntype(\, Tuple{Tridiagonal{Float64},SparseMatrixCSC{Float64,Int64}})
.I don't know how long it is going to take but it takes more than five minutes on my laptop. GDB backtrace shows that it is struggling in type inference on some functions (haven't check which function/types it is since the call chain is super deep and it's a little hard to locate the function it got stuck in...).
This seems to be a recent regression. I remember the WoodburyMatrices test was fine on the version of julia I made mlubin/ReverseDiffSparse.jl#22 (15 days ago) but it's not fine on a 10 days old julia 685f1b0 and there's no update to the WoodburyMatrices package within this period. I might do a bisect later.
@timholy This breaks your package's test so you might want to know this.
The text was updated successfully, but these errors were encountered: