Strange behaviour of map and filter for sparse matrices (vectors) #6918

jumutc · 2014-05-22T13:19:51Z

Please consider the following:

julia> a = sprand(1000000,1,.001)
1000000x1 sparse matrix with 1032 Float64 entries:
    [2595   ,       1]  =  0.263955
    [3511   ,       1]  =  0.46256
    [4244   ,       1]  =  0.793747
    [7711   ,       1]  =  0.481846
    [7898   ,       1]  =  0.910035
    [8008   ,       1]  =  0.210312
    [11887  ,       1]  =  0.997936
    ⋮
    [991837 ,       1]  =  0.975212
    [993002 ,       1]  =  0.806656
    [995393 ,       1]  =  0.0327244
    [996098 ,       1]  =  0.780851
    [996254 ,       1]  =  0.295865
    [998174 ,       1]  =  0.56871
    [998274 ,       1]  =  0.343838
    [999891 ,       1]  =  0.238091

julia> filter(b -> b<2,a)
1000000x1 sparse matrix with 0 Float64 entries:

julia> filter(b -> b<0.2,a)
ERROR: BoundsError()
 in getindex at sparse/sparsematrix.jl:717

julia> map(b -> if b<0.2 0 else b end,a)
ERROR: InexactError()
 in setindex! at sparse/sparsematrix.jl:1038
 in setindex! at sparse/sparsematrix.jl:1032
 in map_to! at abstractarray.jl:1258
 in map at abstractarray.jl:1267

julia> map(b -> if b<2 0 else b end,a)
1000000x1 sparse matrix with 0 Int64 entries:

The same reproduces perfectly for matrices as well!

jiahao · 2014-05-22T13:26:52Z

(Just a technical note: sprand always produces a matrix. We don't support sparse vectors yet, but we do support N×1 sparse matrices.)

jumutc · 2014-05-22T13:29:54Z

No problem... but it looks like a bug. It would nice be have general map and filter functions which apply specifically to non-zero entries too.

jiahao · 2014-05-22T13:33:06Z

Well, the map behavior is at least explainable. b -> if b<0.2 0 else b end produces either an Int or a Float64. (In Julia, 0 is not identical to 0.0.) If the first element is below the threshold, a sparse matrix of Int64s will be created, and assigning a Float64 into it will produce the InexactError. There is no problem if you changed this to b -> if b<0.2 0.0 else b end, or even b -> b<0.2 ? 0.0 : b:

julia> map(b -> if b<0.2 0.0 else b end,a)
1000000x1 sparse matrix with 779 Float64 entries:
    [69     ,       1]  =  0.816295
    [859    ,       1]  =  0.620671
    [7695   ,       1]  =  0.65508
    [8110   ,       1]  =  0.402167
    [8112   ,       1]  =  0.461852
    [9201   ,       1]  =  0.432456
    [9705   ,       1]  =  0.421811
    ⋮

jumutc · 2014-05-22T13:37:33Z

But why we are not trying to perform convert(Type{Tv},v) and fit the value in this case? Maybe it's not a good idea and it will slow down the performance.

jiahao · 2014-05-22T13:43:49Z

But why we are not trying to perform convert(Type{Tv},v) and fit the value in this case?

That doesn't work in general. Imagine if you had b -> b<0.2 ? b : "error" as the function to map over.

jumutc · 2014-05-22T13:45:23Z

Maybe at least error should be more explanatory and we need to check the mapping function for inconsistencies first!

jiahao · 2014-05-22T13:48:48Z

You may be interested in the discussion in #6548.

jumutc · 2014-05-22T13:52:46Z

Sounds attractive. Only filter seems to be "broken" right now!

jiahao · 2014-05-22T14:23:10Z

Here is a minimal example:

julia> A=sparse([1.0]);  filter(x->x==0, A)
ERROR: BoundsError()
 in getindex at sparse/sparsematrix.jl:718
 in getindex at abstractarray.jl:368
 in filter at array.jl:1216

julia> A=sparse([1.0]);  filter(x->x==1, A)
1x1 sparse matrix with 1 Float64 entries:
    [1, 1]  =  1.0

For some reason, A[1,0] is being accessed in the first case but not the second.

jiahao · 2014-05-22T15:04:49Z

The first case involves the call A[I] where I is an invalid SparseMatrixCSC{Bool,Int64}; it has colptr = [1], rowptr = [1, 1] and nzval = []. Have to step away now, but would be great if someone else could look at it. @ViralBShah?

JeffBezanson · 2014-05-29T04:14:45Z

Bump.

Sparse lacks indexing with a single index, except for linear indexing with an Int. Here we need A[I::AbstractArray{Bool}]. This is probably due to the lack of sparse vectors. Maybe we should return Nx1 sparse matrices until then?

ViralBShah · 2014-05-29T04:20:05Z

Cc @tanmaykm too.

ViralBShah · 2014-05-29T05:09:19Z

We should try fix this for 0.3

ViralBShah · 2014-05-29T05:12:45Z

I think that the same discussion as we had on map should probably apply here. @jumutc I suspect that one wants to only apply filter on the nonzeros as well, just like map.

Thanks for kicking the tires on all this. These have always been open questions, and these discussions are getting some answers and clarity on behaviour.

jumutc · 2014-05-29T09:32:52Z

My consideration is due to very slow executions of map and filter in tight loops. This discussion is also linked to https://github.com/JuliaLang/julia/issues/7010. By the way these functions become even slower when GC comes into picture. In this case I have to rely on in place mapping and filtering (map! and filter!) or re-instantiate SparseMatrixCSC objects from the scratch!

ViralBShah · 2014-05-29T15:41:18Z

For the sparse indexing stuff, I have opened #7023

StefanKarpinski · 2014-06-05T05:26:41Z

I really think the only thing we can do here for map(f,s) is to check that f(0) == 0 and throw an error otherwise. If you want produce a dense array, you can do map!(f,full(s)).

mlubin · 2014-06-05T13:34:27Z

That seems reasonable.

lindahua · 2014-06-05T13:49:41Z

+1 for @StefanKarpinski's solution

jiahao · 2014-06-05T14:06:53Z

+1

jumutc · 2014-06-05T14:17:29Z

Wise solution, +1

johnmyleswhite · 2014-06-05T15:34:33Z

Doesn't Stefan's solution assume that your function is pure? While this is statistically reasonable, it seems potentially problematic to me.

jiahao · 2014-06-05T16:04:48Z

Fair point, but I'm not sure calling map with anything other than a pure function is a good idea.

johnmyleswhite · 2014-06-05T17:02:07Z

I agree that it's weird, but I'm not convinced having map work on sparse matrices makes sense either. For me, the only functions that should return sparse results are those functions that can be proven to preserve sparsity a priori. Since map can take in an arbitrary functions and those functions might depend upon run-time information, map will not work for some inputs. And when it fails, it will fail silently.

Imagine that you have a function like foo(x) = rand() > 0.5 ? x + 0 : x + 1. This function will give randomly wrong results under the proposed definition of map.

JeffBezanson · 2014-06-05T19:09:28Z

Interesting point; a function like that is kind of reasonable to use with map. I'd be ok with not supporting map for sparse.

ViralBShah · 2014-06-05T19:28:40Z

Yes, I prefer not supporting map for sparse, and letting users directly work with the nonzeros.

mlubin · 2014-06-05T19:43:39Z

That's fine with me. We should provide a useful error message though and not just remove the method.

StefanKarpinski · 2014-06-05T20:50:36Z

To play devil's advocate here, if you use a random function like that to map over a sparse array, then you randomly get an error or a result, which is kind of what you were asking for, no? The common case here is clearly applying a pure function that takes zero to zero, in which case my proposal does exactly what you want.

johnmyleswhite · 2014-06-05T21:06:08Z

My take is that we usually want to ensure that we don't fail silently in the worst case, even if this makes the common case a little more difficult.

johnmyleswhite · 2014-06-05T21:07:11Z

Also, worth noting that this proposal would make map behave differently for dense arrays and sparse arrays. For dense ones, things would always work. For sparse ones, things would be broken and you wouldn't know.

nalimilan · 2014-06-05T21:32:35Z

Applying a function with a random behavior using map doesn't seem like a very plausible use-case, and the fact that in some cases you will get an error is enough to make it a relatively limited risk. Are there other cases where the proposed behavior would be dangerous? Having map work for sparse matrices like it does for dense arrays, while preserving the array structure and performance, sounds quite appealing.

johnmyleswhite · 2014-06-05T22:40:11Z

I can definitely see why you might want to allow this behavior for convenience, but the proposal at hand is to make map work in a meaningfully different way for sparse arrays than it does for dense arrays. The proposal is effectively partial memoization in which you compute the value of f(0) once and then assume that this value will always be correct, so you don't even bother evaluating f ever again. That's not the semantics that map implements for dense arrays, although it might be the semantics we'd want to insist users should expect.

ViralBShah · 2014-06-06T16:13:46Z

The underlying sparse indexing issue causing the failure reported here is now fixed by #7047. For a discussion on map and sparse, we could perhaps have a newer issue, or reopen this one with a changed title.

StefanKarpinski · 2014-06-06T16:37:24Z

@johnmyleswhite, that depends on what you consider the semantics of map to be. Does map(f,a) evaluate the function on every implied value of a or on every stored value of a? If the meaning is the later, then my proposed semantics are identical to the dense semantics, where the default value of the sparse matrix is considered to be "stored" once (even though it's implicit in most sparse matrices).

johnmyleswhite · 2014-06-06T17:03:19Z

Replying to @StefanKarpinski here although the response might belong in #7157. I totally agree that you could define the meaning of map in terms of stored values, but I think that's subtle enough that some people are going to get confused.

carlobaldassi · 2014-06-07T14:43:00Z

I have

carlobaldassi · 2014-06-07T14:48:01Z

I have to point out that I did not consider side effects when I implemented map for BitArrays and thus that the code uses memoization for obvious performance reasons. We may need to change that unless the consensus is for assuming pure functions.

mlubin · 2014-06-07T14:48:31Z

The concept of stored values shouldn't leak through to high-level functions like these.

nalimilan · 2014-06-07T15:21:29Z

@mlubin It doesn't have to, as the difference is invisible as long as the passed function is pure.

JeffBezanson added bug labels May 22, 2014

ViralBShah added this to the 0.3 milestone May 29, 2014

ViralBShah closed this as completed Jun 6, 2014

jumutc mentioned this issue Jun 6, 2014

Specify behaviour of map for sparse matrices #7157

Closed

Strange behaviour of map and filter for sparse matrices (vectors) #6918

Strange behaviour of map and filter for sparse matrices (vectors) #6918

Comments

jumutc commented May 22, 2014

jiahao commented May 22, 2014

jumutc commented May 22, 2014

jiahao commented May 22, 2014

jumutc commented May 22, 2014

jiahao commented May 22, 2014

jumutc commented May 22, 2014

jiahao commented May 22, 2014

jumutc commented May 22, 2014

jiahao commented May 22, 2014

jiahao commented May 22, 2014

JeffBezanson commented May 29, 2014

ViralBShah commented May 29, 2014

ViralBShah commented May 29, 2014

ViralBShah commented May 29, 2014

jumutc commented May 29, 2014

ViralBShah commented May 29, 2014

StefanKarpinski commented Jun 5, 2014

mlubin commented Jun 5, 2014

lindahua commented Jun 5, 2014

jiahao commented Jun 5, 2014

jumutc commented Jun 5, 2014

johnmyleswhite commented Jun 5, 2014

jiahao commented Jun 5, 2014

johnmyleswhite commented Jun 5, 2014

JeffBezanson commented Jun 5, 2014

ViralBShah commented Jun 5, 2014

mlubin commented Jun 5, 2014

StefanKarpinski commented Jun 5, 2014

johnmyleswhite commented Jun 5, 2014

johnmyleswhite commented Jun 5, 2014

nalimilan commented Jun 5, 2014

johnmyleswhite commented Jun 5, 2014

ViralBShah commented Jun 6, 2014

StefanKarpinski commented Jun 6, 2014

johnmyleswhite commented Jun 6, 2014

carlobaldassi commented Jun 7, 2014

carlobaldassi commented Jun 7, 2014

mlubin commented Jun 7, 2014

nalimilan commented Jun 7, 2014