Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent slow fallback for sparsevector *,/ scalar #15579

Merged
merged 1 commit into from
Mar 25, 2016

Conversation

KristofferC
Copy link
Member

Fixes #15577

@nalimilan
Copy link
Member

Are these methods exercised by the current test suite?

@KristofferC
Copy link
Member Author

Yes,

@test exact_equal(x + x, x * 2)

or hmm, maybe that is only for *.

@KristofferC KristofferC force-pushed the kc/sparse_vec_scal_mult branch 2 times, most recently from 10bc85c to e2c16ac Compare March 21, 2016 17:44
@KristofferC
Copy link
Member Author

Added a test.

@KristofferC
Copy link
Member Author

Should be good to go.

I wonder if there are more of these issues.. AbstractArrays are very nice and make things just work but it is quite easy to accidentally fall back to methods with wrong time complexity.

@timholy
Copy link
Member

timholy commented Mar 21, 2016

The answer is "without a doubt." Of course, the alternative is MethodErrors. These certainly signal that something is missing, but only if you (the developer) try the right incantation. Having users encounter MethodErrors is less nice.

One potential approach:

immutable OpsCountingNumber{T<:Number} <: Number
    x::T
end

let opcounts = 0
    global opcounts_reset, opcounts_get
    opcounts_get() = opcounts::Int
    function opcounts_reset()
        global opcounts
        opcounts::Int = 0
    end
    function (*)(x::OpsCountingNumber, y::Number)
        global opcounts
        opcounts::Int += 1
        OpsCountingNumber(x.x*y)
    end
    ...
end

and then

opcounts_reset()
c = A*b
@test opcounts_get() == 2*nnz(A)

or something. The advantage of this approach is that you're not testing that certain code gets executed, you're testing that it's using a sparse-friendly algorithm.

@tkelman
Copy link
Contributor

tkelman commented Mar 21, 2016

This is incorrect for inf, nan, or divide-by-zero

@andreasnoack
Copy link
Member

@KristofferC Did you look into why the fallbacks don't call scale!(SparseVector)? It seems to that it shouldn't be necessary to have * defined specifically for SparseVector.

@KristofferC
Copy link
Member Author

This is incorrect for inf, nan, or divide-by-zero

AFAIU there is still debate about what is correct/incorrect behavior in situations like that. For now, this just makes sparse vectors and sparse matrices works the same (and make scalar times vector actually usable).

Making sure that users computers blow up if they get an accidental NaN should probably be done in a separate PR. 😛

@KristofferC
Copy link
Member Author

@andreasnoack

2*a dispatches to https://github.com/JuliaLang/julia/blob/master/base/abstractarraymath.jl#L54 which dispatches to 2.*a at

function ($f){T}(A::Number, B::AbstractArray{T})
.

Where should scale! have been called?

@andreasnoack
Copy link
Member

I thought used we used scale! for this but I should have checked. I just did a timing and we are as fast as BLAS here.

LGTM then.

@tkelman
Copy link
Contributor

tkelman commented Mar 22, 2016

AFAIU there is still debate about what is correct/incorrect behavior in situations like that

The only debate was whether it's preferable to error or return a result that's consistent with the dense behavior, I don't think correctness was ever up for debate here. Only whether we think "predictability" is more valuable than returning an IEEE754 correct answer. I don't think performance is ever valuable enough to warrant returning an incorrect answer. The fallback is correct, the optimization in this PR is incorrect for these corner cases.

@KristofferC
Copy link
Member Author

Ok. Let's keep it like it is then if you think that is better.

@timholy
Copy link
Member

timholy commented Mar 22, 2016

#14973

@JaredCrean2
Copy link
Contributor

@tkelman Correct me if I'm wrong here, but IEEE 754 does not explicitly define sparse matrix and vector operations. It only defines operations on individual scalars. So the argument that if any algorithm for a particular operation produces an Inf/NaN, then all algorithms must leads to the removal of many sparse algorithms. For example, sparse matrix-vector multiplication does not comply with this definition because non-stored zeros in the matrix are never explicitly multiplied.

Its also noteworthy that Matlab disregards Inf/NaN behavior for matrix vector multiplication

>> A
A =
     0     0
     1     2
>> As
As =
   (2,1)        1
   (2,2)        2
>> c
c =
   Inf
   Inf
>> A*c
ans =
   NaN
   Inf
>> As*c
ans =
     0
   Inf

Unfortunately, the broad interpretation of IEEE 754 pretty much requires converting to a dense matrix. I think there needs to be a more restricted interpretation of what IEEE 754 requires in order to allow sparse matrix algorithms.

@KristofferC
Copy link
Member Author

I personally think that just documenting that sparse matrices not always follow IEEE is better and simpler. If you do any real work with sparse matrices and it suddenly gets converted to a dense you are dead anyway.

Also having to check for NaN and Inf in all algorithms must get expensive because you have to look at all operations that could possibly become a NaN.

@timholy
Copy link
Member

timholy commented Mar 23, 2016

I don't want to do this either, but I find @tkelman's argument compelling: any operation with A should yield a mathematically equivalent result with full(A). This is arguably the most fundamental guiding principle one has when it comes to sparse matrices.

@KristofferC
Copy link
Member Author

I would argue that having the wrong time complexity for a data structure which sole purpose to exist is for operations to have that time complexity is a more serious bug than strictly following that you should be able to replace A with full(A) everywhere. In fact I know of no software that adheres to this.

Scipy:

In [15]: A = csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
In [16]: (np.NaN * A)[1,0]
Out[16]: 0.0

Matlab actually does this for scalar * matrix but not (as @JaredCrean2 showed for matrix multiplication). Anyway, I don't use sparse vector so I'm just happy that scalar * matrix is fast (even though technically wrong).

@tkelman
Copy link
Contributor

tkelman commented Mar 23, 2016

IEEE 754 only says what to do with scalars. Linear algebra says what to do with the rest. Naive implementations of sparse algorithms ignore corner cases and get them wrong, and I'd consider that a bug. These corner cases exist for very good reasons, and I don't see what good ignoring them does. Getting a garbage answer in better time complexity isn't all that useful.

@KristofferC
Copy link
Member Author

So we should remove our current buggy sparse mat * vector, sparse mat * sparse mat multiplications etc. then and make them fall back to the abstractarray implementation? This would make them 100% unusable for their intended cases but in theory give the correct answer. This is just in theory because except in toy examples you would never see the "correct" NaN. You would just see julia print OutOfMemoryError().

@nalimilan
Copy link
Member

We're having the same discussion as in #14963 again. Let's choose either solution, but at least fix this status quo which is the worst possible solution.

@KristofferC
Copy link
Member Author

Either strict conformity to op(A, B) == op(full(A), full(B) triumphs all performance considerations or it does not. If it is more important then we have to remove a lot of the sparse library since it gets it wrong until someone fixes it. Or the performance side wins and this PR should be merged.

@KristofferC KristofferC reopened this Mar 23, 2016
@tkelman
Copy link
Contributor

tkelman commented Mar 23, 2016

Or we test for the corner cases and fix the existing bugs, and avoid introducing any more. It's not that hard to do things correctly here.

@nalimilan
Copy link
Member

We can be both correct and efficient for the standard case. The debate in the other issue was whether to return a "dense-sparse matrix" in the corner case, or raise an error. In practice, the former will often give an OutOfMemory error (possibly after a lot of swapping).

@ViralBShah ViralBShah added the sparse Sparse arrays label Mar 23, 2016
@timholy
Copy link
Member

timholy commented Mar 23, 2016

Thanks for the link, @nalimilan: I was largely tuned out of email when most of that discussion happened.

I agree we can have our cake and eat it too: @KristofferC, this is not "give up all performance for any kind of sparse operation," this is "add a trivial runtime check, do the fast thing in all normal cases, and blow up in pathalogical cases."

But this might be an argument for a different kind of change (as previously discussed elsewhere): add a new field fillval to SparseMatrixCSC, which would normally be zero but which could alternatively be set to Inf or NaN. This would prevent the blow up to dense representation. This adds complexity because all algorithms would have to check it, but maybe this is better than blowing up.

@nalimilan
Copy link
Member

But this might be an argument for a different kind of change (as previously discussed elsewhere): add a new field fillval to SparseMatrixCSC, which would normally be zero but which could alternatively be set to Inf or NaN. This would prevent the blow up to dense representation. This adds complexity because all algorithms would have to check it, but maybe this is better than blowing up.

That's indeed what I proposed, but people don't seem to consider this is worth the additional complexity...

@tkelman
Copy link
Contributor

tkelman commented Mar 23, 2016

But this might be an argument for a different kind of change (as previously discussed elsewhere): add a new field fillval to SparseMatrixCSC, which would normally be zero but which could alternatively be set to Inf or NaN. This would prevent the blow up to dense representation. This adds complexity because all algorithms would have to check it, but maybe this is better than blowing up.

@timholy that's #10410, and I think it would be best to do that as a separate type. We can certainly look into it, but I think that would be a bigger rewrite than checking for non-finite values and using an inefficient representation in corner cases.

@timholy
Copy link
Member

timholy commented Mar 23, 2016

True, and implementation by wrapping is attractive.

However, one option that doesn't seem to have been discussed is to only allow 0, Inf, and NaN. Then the algorithmic changes are relatively trivial. Of course, that might be a temporary situation (someone will generalize it eventually), but that may not be a bad thing. It does provide a solution to this problem that may not be any more challenging than remembering to call the "blow up" variant in all relevant cases.

EDIT: upon further reflection, I'm less certain that this would represent a significant simplification. Consider, for example, a case where the fillval is Inf but a particular row happens to have all values specified in nzval.

@andreasnoack
Copy link
Member

Naive implementations of sparse algorithms ignore corner cases and get them wrong

@tkelman I don't think this is fair characterization. Here is CHOLMOD

ulia> Acm
x5 Base.SparseArrays.CHOLMOD.Sparse{Float64}:
1.0  0.0  0.0  0.0  0.0
0.0  1.0  0.0  0.0  0.0
0.0  0.0  1.0  0.0  0.0
0.0  0.0  0.0  1.0  0.0
0.0  0.0  0.0  0.0  1.0
ulia> xcm
x1 Base.SparseArrays.CHOLMOD.Dense{Float64}:
NaN  
  0.0
  0.0
  0.0
  0.0
ulia> Acm*xcm
x1 Base.SparseArrays.CHOLMOD.Dense{Float64}:
NaN  
  0.0
  0.0
  0.0
  0.0

and here is PETSc

julia> A
5x5 PETSc.Mat{Float64,:mpiaij}:
 1.0  0.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0  1.0

julia> v
5-element PETSc.Vec{Float64,:mpi}:
 NaN  
   0.0
   0.0
   0.0
   0.0

julia> A*v
5-element PETSc.Vec{Float64,:mpi}:
 NaN  
   0.0
   0.0
   0.0
   0.0

@tkelman
Copy link
Contributor

tkelman commented Mar 23, 2016

So both Cholmod and Petsc implement spmv naively, in a way that gets this corner case wrong. Even Lapack, MKL, etc have bugs with nans. We could treat this as a convention issue, as in gemm where beta = 0 is interpreted as ignoring the existing value rather than literally multiplying it by 0, but in this case this seems like laziness and a leaky abstraction and a failure of generic programming rather than a reasonable API contract. Is the job of sparse matrices to represent the same underlying mathematical operation with a different data structure, or is it "implementation-defined" by how the corner-case-ignoring code behaves?

@JaredCrean2
Copy link
Contributor

So both Cholmod and Petsc implement spmv naively, in a way that gets this corner case wrong. Even Lapack, MKL, etc have bugs with nans

So you think the foundational numerical libraries are all wrong? That seems like an extreme position.

Maintaining some kind of consistency between dense and sparse operations definitely makes sense, but I think the standard of consistency needs to be relaxed a bit. In particular, because Inf and NaN are not real numbers in the mathematical sense, I don't think having algorithms with different Inf/NaN behavior violates the condition that dense and sparse operations should represent the same mathematical operation.

@timholy I think for simple cases like matrix vector multiplication some kind of quick runtime check might work, but for more difficult cases it might not be possible. SVDs of sparse matrices and LU factorizations (especially for matrices that are unstable for LU) strike me as very difficult cases.

@tkelman
Copy link
Contributor

tkelman commented Mar 24, 2016

So you think the foundational numerical libraries are all wrong?

In that they give clearly inconsistent results that don't follow IEEE754 according to the linear algebraic operation they're claiming to perform, yes.

Petsc at least has the perfectly valid excuse that it's a distributed library, where being particular about the corner cases would be substantially more expensive. Cholmod's spmv functionality isn't that library's primary purpose either.

Ignoring the corner cases doesn't make Julia's behavior materially more correct here. Fixing it, for the cases where that's not terribly complicated to do (I agree that factorizations are out of scope here), means we're ahead of the curve in fixing bugs that have either not been reported yet, not been fixed yet, or been left unaddressed in existing libraries. We don't need to repeat the mistakes of our predecessors here.

@timholy
Copy link
Member

timholy commented Mar 24, 2016

It might force us to write our own algorithms, which is not an easy path. I recently wrote a couple of dense Cholesky implementations in pure julia, so (naively) I imagine that sparse LU wouldn't be daunting, but SVD might be a different matter altogether.

As I think about it more, even multiplication isn't entirely straightforward, although I think I have a successful algorithm in mind. Consider this case:

A = [1 2; Inf 2; 1 Inf]
x = [0,1]
b = A*x = [2, NaN, Inf]

Now imagine implementing this with sparse operations, where A has a fillval of Inf (so those two Infs are "missing" from the nzval array). Let's also imagine that we have CSR matrices, just to make this a little easier. To do this in less than O(N^2), I think the algorithm looks like this:

  • xz = cumsum(x .== 0)
  • now do typical sparse matrix-vector multiplication, except...
  • ...any time you have a "gap" in a row, check xz[inext]-xz[iprev] > 0. If so, the corresponding entry in b is NaN, otherwise it's Inf.

But prior to thinking this through, I wasn't expecting to have to tally the zeros in the vector, nor did I immediately realize that the precise pattern of zeros would matter so much that it's best integrated intimately into the multiplication routine (rather than something you handle at the end).

@JaredCrean2
Copy link
Contributor

@tkelman I don't think factorization are out of scope here. I brought them up to test the general applicability of the rule that the dense and sparse versions should produce the same Inf/NaN behavior, and I think @timholy makes a good point that getting the behavior you are describing will be very involved. I'm also pretty sure it will have significant cost in terms of performance. I don't think the decision of the existing numerical libraries to follow IEEE 754 narrowly (ie. if a scalar operation is done, then do it according to the standard, but if a scalar operation is not mathematically required, then don't perform it) is a mistake, it seems more like a deliberate choice in order to get good numerical performance.

@tkelman
Copy link
Contributor

tkelman commented Mar 25, 2016

Petsc and Cholmod don't make any attempt at generic programming in terms of providing interchangeable behavior between dense and sparse. Matlab generally does, and there they've at least made an attempt to handle a fair number of the corner cases here in a more consistent manner than scipy or octave have. Scipy and octave are clearly buggy here, I don't think anyone can argue their behavior is correct or useful for these cases. Where it's simple to get right, in BLAS-1 and BLAS-2-like routines, I don't think it has to be much more complicated than checking for non-finite inputs and branching to different behavior when they're encountered. For things like factorizations, there's less of an expectation of getting the exact same results between dense and sparse since sparse factorizations have to deal with fill-in, partitioning, pivoting, etc in more cases. If you have non-finite nonzero values in the sparse matrix, you're more likely to get a consistent result anyway. Tim seems to be thinking about having non-finite values as an implicit fillval which I was not planning on doing.

@timholy
Copy link
Member

timholy commented Mar 25, 2016

@JaredCrean2, mostly I was pointing out that my "easy alternative" (have a fillval that's restricted to 0, Inf, or NaN) may not be as much of a simplification as I was hoping for---that even in that restricted case, the algorithms have to be written carefully. Contrary to my initial statement, it turns out not to be a free pass to easy-street.

But I remain optimistic that it's possible to achieve both small memory footprint and good performance: if the fillval is 0, mostly we can just do what we do currently. When it's not finite, proper Inf/NaN handling is definitely a little more complicated (and requires careful thought), but it doesn't go from, e.g., O(N) to O(N^2) or anything. So it's likely to be a pretty big pile of work, but to me it seems likely that a talented GSoC student could probably make massive headway (or even complete the whole job, modulo factorizations) in the course of a summer or even less.

That said, @tkelman's strategy of "blowing up to dense" is quite simple, and would immediately give inarguably correct behavior, so it has a lot to recommend (both for and against). If we decide not to do this, I think we can still argue that our choices are well-justified, but it's not unequivocally correct in the same way @tkelman's proposal is.

In other words, as usual all options have substantial positives and negatives, and we have to decide where to choose the tradeoffs. I'm not planning on doing the fillval work myself anytime in the foreseeable future, so unless someone steps forward I suspect the choice is between @tkelman's path and continuing the way we are now.

@KristofferC
Copy link
Member Author

What prevents this from getting merged is this idea: Unless we have a strictly conforming operation so that every sparse matrix can be replaced with its full matrix and give exactly the same answer the operations we should fall back to the dense case.

Current operations that follows this:

  • scalar * sparse vector.

Current operations that does not follow this:

  • sparse vector dot sparse vector
  • sparse mat * vec
  • sparse mat * sparse mat
  • scalar * sparse mat
  • pretty much every single operation touching sparse matrices / vectors.

To make things consistent we could just delete every function in the second list. Our sparse library would be unusable but give correct results according to the rule that prevents this from getting merged. Or we could move the lonely item on the top with the rest. At that point everything is consistent and if someone finds way to improve individual functions without losing performance then that person is free to do so.

@tkelman
Copy link
Contributor

tkelman commented Mar 25, 2016

I didn't say it should fall back to the dense case, I said it should give consistent results. If you'd rather not fix the bug I've pointed out in your code, I can do it in #14973.

@tkelman tkelman merged commit 286dd8c into JuliaLang:master Mar 25, 2016
@KristofferC KristofferC deleted the kc/sparse_vec_scal_mult branch March 25, 2016 19:00
@carnaval
Copy link
Contributor

As an aside, I do agree that giving an answer that differs from the dense case is very wrong, however giving an error is not the same. In fact, the identity f(a) == f(full(a)) is still true in that the set of possible return values is still {true}, the return values of a function throwing an exception is the empty set (ie bottom).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants