Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #14111 dividing sparse vector by constant should give sparse vector #14963

Closed
wants to merge 1 commit into from
Closed

Conversation

pgentili
Copy link

@pgentili pgentili commented Feb 6, 2016

No description provided.

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2016

This shouldn't throw an error for divide by 0 or nan. It should create a SparseVector with dense data.

@StefanKarpinski
Copy link
Member

It seems like a bad move to unpredictably create a really dense vector.

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2016

It's not unpredictable, it's what the user asked for when they divided by 0. I would say it's what we do for dividing a sparse matrix by 0, but it looks like that regressed.

edit: nope, it's always been wrong. full(A ./ 0) should always equal full(A) ./ 0 which it currently doesn't. I'll open a PR.

@ViralBShah ViralBShah added the sparse Sparse arrays label Feb 7, 2016
@nalimilan
Copy link
Member

This could be fixed by allowing custom (i.e. not only zero) values for sparse entries.

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2016

Sure, but the majority of use cases are going to call for a zero default. Don't think making a few corner cases a little cleaner is worth imposing dealing with a potentially non-zero default for all users of the SparseMatrixCSC data structure, but a sparse type with non-zero default could be built by wrapping the existing type.

Multiplication (and arguably scale[!]) is also broken incorrect when the scalar is non-finite.

@nalimilan
Copy link
Member

I'm not really concerned about the corner case of non-zero sparse entries, but about the fact that another corner case (dividing by zero or multiplying by a non-finite value) makes it impossible for division and multiplication by a scalar to return a sparse vector/matrix. Not saying that's the best solution though, but returning a dense vector/matrix sounds silly.

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2016

If you can suggest any other ways of ensuring isequal(full(A ./ 0), full(A) ./ 0) without changing the data structure, please do. Python throws a ZeroDivisionError: float division by zero which we generally don't do for floating-point types, octave gets this wrong, Mathematica gets it right. Anyone want to check what Matlab returns for speye(4) ./ 0?

@ViralBShah
Copy link
Member

For example, cos is another such case, and there are many more. I have felt that we should perhaps have a function that just operates on nozero values, say mapnz. Matlab has spfun for this.

@KristofferC
Copy link
Member

>> speye(4) ./ 0

ans =

   (1,1)      Inf
   (2,1)      NaN
   (3,1)      NaN
   (4,1)      NaN
   (1,2)      NaN
   (2,2)      Inf
   (3,2)      NaN
   (4,2)      NaN
   (1,3)      NaN
   (2,3)      NaN
   (3,3)      Inf
   (4,3)      NaN
   (1,4)      NaN
   (2,4)      NaN
   (3,4)      NaN
   (4,4)      Inf

@timholy
Copy link
Member

timholy commented Feb 7, 2016

Personally, I think this PR gets it right. If you want to divide a sparse matrix by 0, convert it to full first.

@timholy
Copy link
Member

timholy commented Feb 7, 2016

nvm, I see what you're doing in #14973. Carry on.

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2016

scipy.sparse has a set of warnings for "slow things you shouldn't do on sparse matrices" that we could maybe start adopting, but erroring seems like it might be asking for trouble if a value only ends up exactly equal to zero partway through some long calculation.

@nalimilan
Copy link
Member

Raising an error depending on a value being exactly 0 is bad, but it isn't worse than returning a dense matrix, which might even trigger OOM if it's really large. And isn't type-stability in Base a must?

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2016

isn't type-stability in Base a must?

Yes. See the approach in #14973 - it does maintain type stability by always returning SparseMatrixCSC, but follows the Matlab precedent of returning dense data in sparse format when the denominator is 0 (or nan). So it's not performance-stable. We can either return the mathematically correct answer and have it be slow in the uncommon case, or throw an error in the uncommon case. There's precedent for choosing the former (#5824, one of the first issue I opened), and it's consistent with what Matlab does which is generally the reference implementation for most sparse functionality (with the exception of stored zeros).

edit: we don't error for sparse .+ scalar, so I don't see why we should error here

@carnaval
Copy link
Contributor

carnaval commented Feb 7, 2016

edit: we don't error for sparse .+ scalar, so I don't see why we should error here

true but I think no one would expect .+ to keep the matrix sparse whereas mul/div is a lot more likely to be a programming error

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2016

Are there any other examples where we currently throw value-dependent errors when the correct result is representable (just expensive to construct) within the domain of the desired output type?

I don't think it's our place to guess the user's intent about why they are dividing by zero or multiplying by Inf or NaN.

@andreasnoack
Copy link
Member

Are there any other examples where we currently throw value-dependent errors when the correct result is representable (just expensive to construct) within the domain of the desired output type?

I think sqrt(-1.0) is an example.

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2016

That seems to be a slightly different case, as there you get more information by lifting the desired output type to a different domain where you get a more precise correct result (NaN is the closest-to-correct possible representation of an imaginary number within a real data type, but it's a lot less correct than an imaginary number represented in a complex data type). Here you don't gain anything by being pedantic and erroring to tell the user to manually add full if their denominator happens to equal 0 - you save some memory by not creating the integer indices of representing a dense output in sparse format, but you do have to make an extra copy of the data by calling full.

@StefanKarpinski
Copy link
Member

I'd much rather have the program error out than try to allocate way too much memory, start swapping and become unusable. Yes, OSes should be smart enough to avoid that, but the reality is they're not. If I really want to generate a full matrix, then I can explicitly ask for that by converting to full first.

@tkelman
Copy link
Contributor

tkelman commented Feb 8, 2016

Representing a full matrix in sparse format is only ever a constant factor more expensive than manually doing the conversion to full. Where else do we refuse to calculate something because it's inconveniently inefficient at a specific value? We don't error for addition of a sparse matrix by a constant, or taking the cosine of a sparse matrix, etc. We do the conversion to full when the operation the user asked for requires it. If we can give a type domain MethodError then I often agree a MethodError is preferable to a slow dense fallback, but this is value domain and we don't usually throw this kind of exception for operations when we can represent the same result but just don't want to.

@nalimilan
Copy link
Member

My way of thinking here is that if one is serious about sparse matrices, one cannot assume that it will fit into memory after converting it to a dense format. Triggering OOM is painful and may have serious consequences like making debugging painful, creating problems for unrelated processes, making a machine unresponsive, annoying other users of a shared machine, etc. Also, tests may succeed when done on relatively small matrices, but pathological failures might happen on production when using bigger data sets. Raising an error is clearer, and the message can indicate the "work around".

@tkelman
Copy link
Contributor

tkelman commented Feb 8, 2016

If the actual calculation returning in a sparse format would trigger an OOM, converting to full is almost guaranteed to do the same. So we'd be giving an error that suggests doing something that is usually not going to work either, in the OOM case. If you're serious about sparse matrices then there are lots of operations you as a user shouldn't do. As a language we shouldn't be erroring to prevent users from doing them though. Warnings maybe, but erroring is very much in the "language designer knows better than user" direction that I wouldn't think we'd want to go in.

@StefanKarpinski
Copy link
Member

Where else do we refuse to calculate something because it's inconveniently inefficient at a specific value?

All of our numeric promotion rules are driven by this exact concern.

We don't error for addition of a sparse matrix by a constant, or taking the cosine of a sparse matrix, etc.

But this completely predictable from the types of the arguments, not their values. If you add a constant to a sparse matrix, you always get a dense answer; likewise, if you take a cosine of a sparse matrix you always get a dense answer. Suppose you're doing a sparse computation as part of a service. It's way better to get an easily trapped, clear error if the divisor ends up being 0 here than to suddenly have your entire service crash because of mysterious OOMs. The situation is mildly better when doing an interactive computation, but only mildly – getting an unpredictable OOM still sucks.

@tkelman
Copy link
Contributor

tkelman commented Feb 8, 2016

All of our numeric promotion rules are driven by this exact concern.

Could you explain what you mean here? I don't see how this is the case, particularly not in the value domain. The closest comparison I can see is integer divide by zero, but the result isn't representable within the same type there.

If we had a lot of other cases of throwing divide by zero exceptions for floating point arrays (or integer overflow, etc) I'd have a different opinion here. sparse.^0 hasn't killed any web services in the past 2 years to the best of my knowledge. All I'm advocating for is maintaining the identity (with nans treated as equal) full(A op B) == full(A) op full(B), rather than one giving an exception and the other not.

I do a lot of sparse linear algebra every day, and have never wanted it to throw more errors. A library's job is to give the result the user asks for, whenever possible.

@StefanKarpinski
Copy link
Member

This is what I mean about promotion rules. We started out promoting everything to Int. But for arrays, this is bad since you don't want Int8[] + Int8[] => Int[] for example. So we made arrays behave differently: Int8[] + Int8[] => Int8[]. But then this seems inconsistent with the scalar behavior which was still Int8 + Int8 => Int, so we eventually changed to scalar behavior to match the array behavior: Int8 + Int8 => Int8.

I get why full(A op B) == full(A) op full(B) is appealing but it seems reasonable for one of the paths around the commutative diagram to be a partial function and not the other. If I make A and B full explicitly, then obviously I'm ok with the amount of memory that will take. If I don't make them full explicitly and they sometimes end up being sparse but dense, thus taking 3x the amount of storage they would if full, there's a very serious chance that I'm not ok with it and it will make my application crash unexpectedly. Having a predictable memory requirement is a crucial property of reliable deployed applications. What you're arguing for makes memory usage depend on a corner case that is so easy to not hit – until you do. If sparse.^0 hasn't killed any web services, it's only luck and lack of use. You can't deny that it's a very real possibility.

@StefanKarpinski
Copy link
Member

Unpredictably allocating huge amounts of memory is a particularly nasty form of unpredictable behavior in applications since there's a very strong chance of it causing an entire machine to go down, not just one task. Throwing an error will only cause one task to fail – in an easily recoverable and debuggable way. Note that if you do sparse + c or cos(sparse) you will always get a memory blowup, which is not a problem in the same way. While full(A op B) == full(A) op full(B) is a good principle, I think that it needs to be trumped by another principle: don't let corner cases allocate huge amounts amounts of memory when normal cases would not.

@JeffBezanson
Copy link
Member

There's a danger that if A ./ n throws an error for n==0, the code will be rewritten to full(A) ./ n, making it perform badly in all cases. If we want to throw errors instead of letting memory blowup happen, maybe instead we should work to improve OutOfMemoryError. Then you only get an error if it's really needed to prevent the machine from thrashing. Currently we rely on malloc returning NULL, and we could instead use our own heuristics based on RSS, physical memory size, and the allocation size requested.

@tkelman
Copy link
Contributor

tkelman commented Feb 8, 2016

I don't follow how the integer promotion applies here?

Sparse matrices are known for having lots of operations that can unpredictably allocate large amounts of memory, not all of which can be determined just from the types of the inputs. It's not appropriate to error for trying to take the unpivoted cholesky factorization of an arrowhead matrix, is it? Or spmm with some especially dense columns/rows? full(A op B) == full(A) op full(B) shouldn't lose out to this particular instance of this kind of situation being easier to detect than most. If it's a problem in application code then it should be checked for anyway, but giving nans is the right mathematical answer.

@StefanKarpinski
Copy link
Member

You asked about design decisions where we sacrificed mathematical correctness for the sake of not allocating too much memory. Integer promotions are an example. I'm not going to fight this – I've made my case.

@tkelman
Copy link
Contributor

tkelman commented Feb 8, 2016

Oh I see. That wasn't my question then, I was asking about value-domain errors. We do get a few more InexactErrors now...

Predictable memory use is something that sparse matrices can't achieve for all possible inputs and all possible operations (even the ones that you expect to preserve sparsity most of the time). Much of the interesting work in the field is all about doing reorderings and partitionings to reduce fill-in, but the fact that there are degenerate expensive cases is unavoidable.

@JeffBezanson
Copy link
Member

Yes, I think even if we want the policy "error on operations that blow up sparse matrices", the only reasonable way to implement it is with OutOfMemoryError. There isn't a general version of such a check.

I also think there are good reasons to have Int8 + Int8 = Int8 other than memory used by array operations.

@StefanKarpinski
Copy link
Member

I also think there are good reasons to have Int8 + Int8 = Int8 other than memory used by array operations.

There are, but this is still the history of how these operations worked and why we changed them.

@vtjnash
Copy link
Member

vtjnash commented Feb 8, 2016

I thought the real purpose of NaN in IEEE-754 was for deferred exception detection? I'm not sure I understand why it is important that this doesn't throw a DivideByZero error but instead is able to generate a dense array of NaN values. If that's actually the problem here, can I propose that x / 0 should also throw a DivideByZero error (eg. implement the traps from the IEEE 754 standard)?

@JeffBezanson
Copy link
Member

Giving errors instead of NaNs is generally defensible, but as you observe we'd have to decide to do that consistently everywhere. On the other hand 1/0 == Inf is not really an error in the same way, and does the right thing in some computations.

@andreasnoack
Copy link
Member

Predictable memory use is something that sparse matrices can't achieve for all possible inputs and all possible operations (even the ones that you expect to preserve sparsity most of the time).

The specific method discussed here has perfectly predictable memory use if v./0 throws.

@JeffBezanson
Copy link
Member

I agree with @tkelman here. If there's a well-defined answer, we should just return it. Using too much memory, or unpredictable amounts of memory, is not an error unless you actually run out, which is the domain of OutOfMemoryError (which unfortunately we need to hack on more, since OSes have declined to handle the issue in a sane way).

@tkelman
Copy link
Contributor

tkelman commented Feb 9, 2016

Sorry about the volume of debate on your first PR @pgentili! I'll continue on #14973 for non-finite scale on the matrix case, but if we decide to go that direction then this PR can do something similar (simpler for the vector case actually, since you can do nzval = fill(zero(eltype(x)) / a, length(x)); nzval[x.nzind] = x.nzval / a)

@nalimilan
Copy link
Member

Is the idea of supporting non-zero sparse entries (#10410) really out of question? It's been mentioned several times in the past (for example here where @StefanKarpinski referred to another discussion I can't find). I must say I have no particular need for this myself, but it would solve all the issues discussed here, as well as map (https://github.com/JuliaLang/julia/issues/7010), by making sparse matrices closed under all operations.

(If that's a "no", please do not let my question derail this already long discussion -- debates can go to #10410.)

@jiahao
Copy link
Member

jiahao commented Feb 12, 2016

The discussion about computations producing NaN is not new - see #5234. From what I can tell, the only inkling of a rule we have in Julia is

  • produce NaN only if the check is expensive relative to the cost of the operation, e.g. for real floating-point +, -, *, /
  • NaNs as input can propagate.

There's really no reason to hold up this PR for literally an edge case, so I would be in favor of accepting a PR that checks its inputs. In this case, I would prefer DomainError over ArgumentError in this case for a mathematical operation.

Note also that this PR makes A./x return the same values as scale(A, inv(x)) in the cases when the former doesn't error out. If we really wanted to be pedantic, arguably scale and scale! gets 0.0 * Inf wrong also, since by the "do everything by the IEEE 754 book" argument, the result should be a vector of NaNs, but currently it just returns an empty vector:

julia> scale(sparsevec(zeros(5)), Inf)
Sparse vector of length 5, with 0 Float64 nonzero entries:

@tkelman
Copy link
Contributor

tkelman commented Feb 12, 2016

Yes, scale gets this wrong and I plan on fixing it, see my todo in #14973.

A./b throwing an error for sparse when it wouldn't for full(A)./b is the wrong behavior for the corner case in terms of generic programming IMO.

@tkelman
Copy link
Contributor

tkelman commented Mar 25, 2016

This was fixed by #15579, but I'll cherry-pick most of your tests into #14973 since they are good for coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.