-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent slow fallback for sparsevector *,/ scalar #15579
Conversation
Are these methods exercised by the current test suite? |
Yes, julia/test/sparsedir/sparsevector.jl Line 507 in c18bc53
or hmm, maybe that is only for |
10bc85c
to
e2c16ac
Compare
Added a test. |
Should be good to go. I wonder if there are more of these issues.. AbstractArrays are very nice and make things just work but it is quite easy to accidentally fall back to methods with wrong time complexity. |
The answer is "without a doubt." Of course, the alternative is One potential approach: immutable OpsCountingNumber{T<:Number} <: Number
x::T
end
let opcounts = 0
global opcounts_reset, opcounts_get
opcounts_get() = opcounts::Int
function opcounts_reset()
global opcounts
opcounts::Int = 0
end
function (*)(x::OpsCountingNumber, y::Number)
global opcounts
opcounts::Int += 1
OpsCountingNumber(x.x*y)
end
...
end and then opcounts_reset()
c = A*b
@test opcounts_get() == 2*nnz(A) or something. The advantage of this approach is that you're not testing that certain code gets executed, you're testing that it's using a sparse-friendly algorithm. |
This is incorrect for inf, nan, or divide-by-zero |
@KristofferC Did you look into why the fallbacks don't call |
AFAIU there is still debate about what is correct/incorrect behavior in situations like that. For now, this just makes sparse vectors and sparse matrices works the same (and make scalar times vector actually usable). Making sure that users computers blow up if they get an accidental NaN should probably be done in a separate PR. 😛 |
Line 111 in c18bc53
Where should |
I thought used we used LGTM then. |
The only debate was whether it's preferable to error or return a result that's consistent with the dense behavior, I don't think correctness was ever up for debate here. Only whether we think "predictability" is more valuable than returning an IEEE754 correct answer. I don't think performance is ever valuable enough to warrant returning an incorrect answer. The fallback is correct, the optimization in this PR is incorrect for these corner cases. |
Ok. Let's keep it like it is then if you think that is better. |
@tkelman Correct me if I'm wrong here, but IEEE 754 does not explicitly define sparse matrix and vector operations. It only defines operations on individual scalars. So the argument that if any algorithm for a particular operation produces an Inf/NaN, then all algorithms must leads to the removal of many sparse algorithms. For example, sparse matrix-vector multiplication does not comply with this definition because non-stored zeros in the matrix are never explicitly multiplied. Its also noteworthy that Matlab disregards Inf/NaN behavior for matrix vector multiplication
Unfortunately, the broad interpretation of IEEE 754 pretty much requires converting to a dense matrix. I think there needs to be a more restricted interpretation of what IEEE 754 requires in order to allow sparse matrix algorithms. |
I personally think that just documenting that sparse matrices not always follow IEEE is better and simpler. If you do any real work with sparse matrices and it suddenly gets converted to a dense you are dead anyway. Also having to check for NaN and Inf in all algorithms must get expensive because you have to look at all operations that could possibly become a NaN. |
I don't want to do this either, but I find |
I would argue that having the wrong time complexity for a data structure which sole purpose to exist is for operations to have that time complexity is a more serious bug than strictly following that you should be able to replace Scipy: In [15]: A = csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
In [16]: (np.NaN * A)[1,0]
Out[16]: 0.0 Matlab actually does this for scalar * matrix but not (as @JaredCrean2 showed for matrix multiplication). Anyway, I don't use sparse vector so I'm just happy that scalar * matrix is fast (even though technically wrong). |
IEEE 754 only says what to do with scalars. Linear algebra says what to do with the rest. Naive implementations of sparse algorithms ignore corner cases and get them wrong, and I'd consider that a bug. These corner cases exist for very good reasons, and I don't see what good ignoring them does. Getting a garbage answer in better time complexity isn't all that useful. |
So we should remove our current buggy sparse mat * vector, sparse mat * sparse mat multiplications etc. then and make them fall back to the abstractarray implementation? This would make them 100% unusable for their intended cases but in theory give the correct answer. This is just in theory because except in toy examples you would never see the "correct" NaN. You would just see julia print |
We're having the same discussion as in #14963 again. Let's choose either solution, but at least fix this status quo which is the worst possible solution. |
Either strict conformity to |
Or we test for the corner cases and fix the existing bugs, and avoid introducing any more. It's not that hard to do things correctly here. |
We can be both correct and efficient for the standard case. The debate in the other issue was whether to return a "dense-sparse matrix" in the corner case, or raise an error. In practice, the former will often give an |
Thanks for the link, @nalimilan: I was largely tuned out of email when most of that discussion happened. I agree we can have our cake and eat it too: @KristofferC, this is not "give up all performance for any kind of sparse operation," this is "add a trivial runtime check, do the fast thing in all normal cases, and blow up in pathalogical cases." But this might be an argument for a different kind of change (as previously discussed elsewhere): add a new field |
That's indeed what I proposed, but people don't seem to consider this is worth the additional complexity... |
@timholy that's #10410, and I think it would be best to do that as a separate type. We can certainly look into it, but I think that would be a bigger rewrite than checking for non-finite values and using an inefficient representation in corner cases. |
True, and implementation by wrapping is attractive. However, one option that doesn't seem to have been discussed is to only allow 0, EDIT: upon further reflection, I'm less certain that this would represent a significant simplification. Consider, for example, a case where the |
@tkelman I don't think this is fair characterization. Here is CHOLMOD ulia> Acm
x5 Base.SparseArrays.CHOLMOD.Sparse{Float64}:
1.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0
ulia> xcm
x1 Base.SparseArrays.CHOLMOD.Dense{Float64}:
NaN
0.0
0.0
0.0
0.0
ulia> Acm*xcm
x1 Base.SparseArrays.CHOLMOD.Dense{Float64}:
NaN
0.0
0.0
0.0
0.0 and here is PETSc julia> A
5x5 PETSc.Mat{Float64,:mpiaij}:
1.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0
julia> v
5-element PETSc.Vec{Float64,:mpi}:
NaN
0.0
0.0
0.0
0.0
julia> A*v
5-element PETSc.Vec{Float64,:mpi}:
NaN
0.0
0.0
0.0
0.0 |
So both Cholmod and Petsc implement spmv naively, in a way that gets this corner case wrong. Even Lapack, MKL, etc have bugs with nans. We could treat this as a convention issue, as in gemm where beta = 0 is interpreted as ignoring the existing value rather than literally multiplying it by 0, but in this case this seems like laziness and a leaky abstraction and a failure of generic programming rather than a reasonable API contract. Is the job of sparse matrices to represent the same underlying mathematical operation with a different data structure, or is it "implementation-defined" by how the corner-case-ignoring code behaves? |
So you think the foundational numerical libraries are all wrong? That seems like an extreme position. Maintaining some kind of consistency between dense and sparse operations definitely makes sense, but I think the standard of consistency needs to be relaxed a bit. In particular, because Inf and NaN are not real numbers in the mathematical sense, I don't think having algorithms with different Inf/NaN behavior violates the condition that dense and sparse operations should represent the same mathematical operation. @timholy I think for simple cases like matrix vector multiplication some kind of quick runtime check might work, but for more difficult cases it might not be possible. SVDs of sparse matrices and LU factorizations (especially for matrices that are unstable for LU) strike me as very difficult cases. |
In that they give clearly inconsistent results that don't follow IEEE754 according to the linear algebraic operation they're claiming to perform, yes. Petsc at least has the perfectly valid excuse that it's a distributed library, where being particular about the corner cases would be substantially more expensive. Cholmod's spmv functionality isn't that library's primary purpose either. Ignoring the corner cases doesn't make Julia's behavior materially more correct here. Fixing it, for the cases where that's not terribly complicated to do (I agree that factorizations are out of scope here), means we're ahead of the curve in fixing bugs that have either not been reported yet, not been fixed yet, or been left unaddressed in existing libraries. We don't need to repeat the mistakes of our predecessors here. |
It might force us to write our own algorithms, which is not an easy path. I recently wrote a couple of dense Cholesky implementations in pure julia, so (naively) I imagine that sparse LU wouldn't be daunting, but SVD might be a different matter altogether. As I think about it more, even multiplication isn't entirely straightforward, although I think I have a successful algorithm in mind. Consider this case: A = [1 2; Inf 2; 1 Inf]
x = [0,1]
b = A*x = [2, NaN, Inf] Now imagine implementing this with sparse operations, where
But prior to thinking this through, I wasn't expecting to have to tally the zeros in the vector, nor did I immediately realize that the precise pattern of zeros would matter so much that it's best integrated intimately into the multiplication routine (rather than something you handle at the end). |
@tkelman I don't think factorization are out of scope here. I brought them up to test the general applicability of the rule that the dense and sparse versions should produce the same Inf/NaN behavior, and I think @timholy makes a good point that getting the behavior you are describing will be very involved. I'm also pretty sure it will have significant cost in terms of performance. I don't think the decision of the existing numerical libraries to follow IEEE 754 narrowly (ie. if a scalar operation is done, then do it according to the standard, but if a scalar operation is not mathematically required, then don't perform it) is a mistake, it seems more like a deliberate choice in order to get good numerical performance. |
Petsc and Cholmod don't make any attempt at generic programming in terms of providing interchangeable behavior between dense and sparse. Matlab generally does, and there they've at least made an attempt to handle a fair number of the corner cases here in a more consistent manner than scipy or octave have. Scipy and octave are clearly buggy here, I don't think anyone can argue their behavior is correct or useful for these cases. Where it's simple to get right, in BLAS-1 and BLAS-2-like routines, I don't think it has to be much more complicated than checking for non-finite inputs and branching to different behavior when they're encountered. For things like factorizations, there's less of an expectation of getting the exact same results between dense and sparse since sparse factorizations have to deal with fill-in, partitioning, pivoting, etc in more cases. If you have non-finite nonzero values in the sparse matrix, you're more likely to get a consistent result anyway. Tim seems to be thinking about having non-finite values as an implicit |
@JaredCrean2, mostly I was pointing out that my "easy alternative" (have a But I remain optimistic that it's possible to achieve both small memory footprint and good performance: if the That said, @tkelman's strategy of "blowing up to dense" is quite simple, and would immediately give inarguably correct behavior, so it has a lot to recommend (both for and against). If we decide not to do this, I think we can still argue that our choices are well-justified, but it's not unequivocally correct in the same way @tkelman's proposal is. In other words, as usual all options have substantial positives and negatives, and we have to decide where to choose the tradeoffs. I'm not planning on doing the |
What prevents this from getting merged is this idea: Unless we have a strictly conforming operation so that every sparse matrix can be replaced with its full matrix and give exactly the same answer the operations we should fall back to the dense case. Current operations that follows this:
Current operations that does not follow this:
To make things consistent we could just delete every function in the second list. Our sparse library would be unusable but give correct results according to the rule that prevents this from getting merged. Or we could move the lonely item on the top with the rest. At that point everything is consistent and if someone finds way to improve individual functions without losing performance then that person is free to do so. |
I didn't say it should fall back to the dense case, I said it should give consistent results. If you'd rather not fix the bug I've pointed out in your code, I can do it in #14973. |
As an aside, I do agree that giving an answer that differs from the dense case is very wrong, however giving an error is not the same. In fact, the identity |
Fixes #15577