Performance degradation when reducing SparseMatrixCSC #27836

bicycle1885 · 2018-06-28T03:03:44Z

I've found an extreme performance degradation (~30x slower) on Julia 0.7.0-beta vs. on Julia 0.6.2 when reducing a large SparseMatrixCSC matrix using maximum. I couldn't find differences in the data, so I think this is a problem in SparseArrays.

Julia 0.6.2

julia> size(counts)
(27998, 1306127)

julia> typeof(counts)
SparseMatrixCSC{Int32,Int64}

julia> @time maximum(counts, 2);
  9.171888 seconds (103.28 k allocations: 5.739 MiB, 0.06% gc time)

julia> @time sum(counts .> 0)
 10.630171 seconds (42.21 k allocations: 22.013 GiB, 0.08% gc time)
2624828308

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

Julia 0.7.0-beta:

julia> size(counts)
(27998, 1306127)

julia> typeof(counts)
SparseMatrixCSC{Int32,Int64}

julia> @time maximum(counts, dims=2);
271.014620 seconds (5.28 M allocations: 34.628 GiB, 0.09% gc time)

julia> @time sum(counts .> 0)
 10.284461 seconds (348.48 k allocations: 22.028 GiB, 0.69% gc time)
2624828308

julia> versioninfo()
Julia Version 0.7.0-beta.0
Commit f41b1ecaec (2018-06-24 01:32 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)

The text was updated successfully, but these errors were encountered:

ViralBShah · 2018-06-28T03:07:19Z

Wow that is a lot of allocation for a simple operation. Is there an easy way to create the input sparse matrix?

Probably you want to work with the transpose - but that is a different matter.

bicycle1885 · 2018-06-28T03:13:45Z

I made this matrix from an HDF5 file downloaded from https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons (the "aggr - Gene / cell matrix HDF5 (filtered)" file, 3.93 GB). The following script can load the file into a SparseMatrixCSC matrix:

using SparseArrays
using HDF5
function load_10x_format(data)
    featurenames = read(data["genes"])
    cellnames = read(data["barcodes"])
    m, n = read(data["shape"])
    counts = SparseMatrixCSC(
        m, n,
        read(data["indptr"]).+1,
        read(data["indices"]).+1,
        convert(Vector{Int32}, read(data["data"])))
    SparseArrays.sortSparseMatrixCSC!(counts)
    return featurenames, cellnames, counts
end
featurenames, cellnames, counts = HDF5.h5open(file->load_10x_format(file["/mm10"]), "1M_neurons.h5")

It requires ~30 GiB of RAM to load but it may be reproducible with smaller matrices.

bicycle1885 · 2018-06-28T03:27:59Z

From the following profiling, the problem is around these lines: https://github.com/JuliaLang/julia/blob/v0.7.0-beta/stdlib/SparseArrays/src/sparsematrix.jl#L1689-L1692.

julia> Profile.init(delay=0.1)

julia> @profile maximum(counts, dims=2);

julia> Profile.print()
2723 ./task.jl:257; (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})()
 2723 ...sr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:116; macro expansion
  2723 ...sr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:85; eval_user_input(::Any, ::REPL.REPLBackend)
   2723 ./boot.jl:319; eval(::Module, ::Any)
    2723 ./<missing>:0; top-level scope
     2723 ...e/julia/stdlib/v0.7/Profile/src/Profile.jl:27; top-level scope
      2723 ./none:0; (::getfield(Base, Symbol("#kw##maximum")))(::NamedTuple{(:dims,)...
       2723 ./reducedim.jl:628; #maximum#544
        2723 ./reducedim.jl:654; _maximum
         2723 ./reducedim.jl:655; _maximum
          2723 ./none:0; #mapreduce
           2723 ./reducedim.jl:276; #mapreduce#537
            2723 ./reducedim.jl:285; _mapreduce_dim(::Function, ::Function, ::SparseMatrixCSC{Int...
             2595 ./reducedim.jl:244; mapreducedim!(::Function, ::Function, ::SparseMatrixCSC{Int...
              2595 ...v0.7/SparseArrays/src/sparsematrix.jl:1713; _mapreducedim!(::typeof(identity), ::typeof(max), ::Sparse...
               1    ./range.jl:0; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               18   ./sort.jl:0; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               369  ./sort.jl:179; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               2161 ...v0.7/SparseArrays/src/sparsematrix.jl:1689; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
                2161 ./simdloop.jl:73; macro expansion
                 2    ./array.jl:707; getindex
                 15   ./array.jl:707; macro expansion
                 2144 ...v0.7/SparseArrays/src/sparsematrix.jl:84; macro expansion
                  38  ./array.jl:707; getindex
                  130 ./array.jl:745; setindex!
                  71  ./int.jl:53; getindex
                  17  ./int.jl:0; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                  11  ./promotion.jl:436; max
                   3 ./int.jl:49; <
                  38  ./sort.jl:0; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                  65  ./sort.jl:179; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                  101 ./sysimg.jl:18; getindex
                  15  ...0.7/SparseArrays/src/sparsematrix.jl:1910; getindex
                   15 ./array.jl:707; getindex
                  5   ...0.7/SparseArrays/src/sparsematrix.jl:1912; getindex
                  461 ...0.7/SparseArrays/src/sparsematrix.jl:1913; getindex
                   14  ./int.jl:0; searchsortedfirst
                   153 ./sort.jl:179; searchsortedfirst
                    153 ./int.jl:52; -
                   45  ./sort.jl:180; searchsortedfirst
                    45 ./int.jl:452; >>>
                     45 ./int.jl:444; >>>
                   240 ./sort.jl:181; searchsortedfirst
                    240 ./ordering.jl:49; lt
                     240 ./operators.jl:338; isless
                      240 ./int.jl:49; <
                   9   ./sysimg.jl:18; getproperty
                  142 ...0.7/SparseArrays/src/sparsematrix.jl:1914; getindex
                   78 ./array.jl:707; getindex
                   25 ./sysimg.jl:18; getproperty
                  28  ...0.7/SparseArrays/src/sparsematrix.jl:2330; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                  32  ...0.7/SparseArrays/src/sparsematrix.jl:2333; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   31 ./int.jl:428; <=
                  16  ...0.7/SparseArrays/src/sparsematrix.jl:2336; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   16 ./array.jl:707; getindex
                  839 ...0.7/SparseArrays/src/sparsematrix.jl:2338; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   204 ./int.jl:49; searchsortedfirst
                   14  ./sort.jl:0; searchsortedfirst
                   406 ./sort.jl:179; searchsortedfirst
                    125 ./int.jl:52; -
                   215 ./sort.jl:180; searchsortedfirst
                    65  ./int.jl:53; +
                    39  ./int.jl:452; >>>
                     39 ./int.jl:444; >>>
                    111 ./sysimg.jl:18; +
                  88  ...0.7/SparseArrays/src/sparsematrix.jl:2339; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   38 ./array.jl:707; getindex
                   18 ./sysimg.jl:18; getproperty
                  23  ...0.7/SparseArrays/src/sparsematrix.jl:2341; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   23 ./array.jl:745; setindex!
                  13  ...0.7/SparseArrays/src/sparsematrix.jl:2342; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
               7    ...v0.7/SparseArrays/src/sparsematrix.jl:1909; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               32   ...v0.7/SparseArrays/src/sparsematrix.jl:1914; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               7    ...v0.7/SparseArrays/src/sparsematrix.jl:2330; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int6...
             128  ./reducedim.jl:127; reducedim_init(::Function, ::typeof(max), ::SparseMatrixCSC...
              128 ./reducedim.jl:94; reducedim_initarray0(::SparseMatrixCSC{Int32,Int64}, ::Int64...
               128 ./reducedim.jl:203; mapfirst!(::Function, ::SparseMatrixCSC{Int32,Int64}, ::Sp...
                128 ./abstractarray.jl:1958; map!(::typeof(identity), ::SparseMatrixCSC{Int32,Int64}, ...
                 128 ./abstractarray.jl:997; setindex!
                  128 ./abstractarray.jl:1027; _setindex!
                   84 ...0.7/SparseArrays/src/sparsematrix.jl:2347; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                    84 ./array.jl:1127; insert!
                     84 ./array.jl:791; _growat!
                   44 ...0.7/SparseArrays/src/sparsematrix.jl:2348; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                    44 ./array.jl:1127; insert!
                     44 ./array.jl:791; _growat!

bicycle1885 · 2018-06-28T03:39:27Z

I didn't know but the reduction operation now returns a sparse matrix for a given sparse matrix, which causes the performance degradation in the tight loop of reduction. This change may be reasonable but the performance degradation is not acceptable to me.

julia> typeof(maximum(counts, 2))  # Julia 0.6.2
Array{Int32,2}

julia> typeof(maximum(counts, dims=2))  # Julia 0.7.0-beta
SparseMatrixCSC{Int32,Int64}

ViralBShah · 2018-06-28T04:00:26Z

I agree the performance is not acceptable.

andreasnoack · 2018-06-28T11:24:41Z

This is an unintended consequence of #23227.

Fixes #27836

bicycle1885 · 2018-06-28T14:06:08Z

Thank you, @andreasnoack! Let me know if I need to benchmark your fix.

Fixes #27836

ViralBShah · 2018-06-28T21:02:44Z

It appears that even in the "before" version, things are too slow - based on the amount of allocation. This operation should literally be possible with a single scan of the data structure.

Keno · 2018-07-01T00:24:16Z

Ideally add the benchmark to BaseBenchmarks, so it's tracked.

Fixes #27836

* Adjust initialization in maximum and minimum Fixes #27836 * Reduce to first element in index range for OffsetArrays * Adjust mapreduce to use first element in offset range as well. Adjust tests * Avoid defining global Areduc variable in reducedim tests * Add comment in mapslices to explain the wrapping of scalars

Fixes #27836

* Adjust initialization in maximum and minimum Fixes #27836 * Reduce to first element in index range for OffsetArrays * Adjust mapreduce to use first element in offset range as well. Adjust tests * Avoid defining global Areduc variable in reducedim tests * Add comment in mapslices to explain the wrapping of scalars * Change reduced_index for Slices and add test for previously failing maximum of OffsetArray example * Add checks that reductions along dimensions are inferred

ViralBShah added the sparse Sparse arrays label Jun 28, 2018

fredrikekre added performance Must go faster regression Regression in behavior compared to a previous version labels Jun 28, 2018

andreasnoack added a commit that referenced this issue Jun 28, 2018

Adjust initialization in maximum and minimum

a828efa

Fixes #27836

andreasnoack mentioned this issue Jun 28, 2018

Adjust initialization in maximum and minimum #27845

Merged

andreasnoack added a commit that referenced this issue Jun 28, 2018

Adjust initialization in maximum and minimum

a599564

Fixes #27836

andreasnoack added a commit that referenced this issue Jun 28, 2018

Adjust initialization in maximum and minimum

9e37e85

Fixes #27836

andreasnoack added a commit that referenced this issue Jun 28, 2018

Adjust initialization in maximum and minimum

561de73

Fixes #27836

andreasnoack added a commit that referenced this issue Jun 28, 2018

Adjust initialization in maximum and minimum

a802fb1

Fixes #27836

andreasnoack added a commit that referenced this issue Jun 28, 2018

Adjust initialization in maximum and minimum

8514ee3

Fixes #27836

andreasnoack added a commit that referenced this issue Jun 28, 2018

Adjust initialization in maximum and minimum

31c67c2

Fixes #27836

andreasnoack added a commit that referenced this issue Jun 28, 2018

Adjust initialization in maximum and minimum

cae4be0

Fixes #27836

giordano mentioned this issue Jul 8, 2018

minimum with dims keyword is type-unstable on master #27992

Closed

andreasnoack added a commit that referenced this issue Jul 9, 2018

Adjust initialization in maximum and minimum

3bab777

Fixes #27836

andreasnoack closed this as completed in #27845 Jul 13, 2018

KristofferC added the potential benchmark Could make a good benchmark in BaseBenchmarks label Jul 13, 2018

andreasnoack added a commit that referenced this issue Jul 13, 2018

Adjust initialization in maximum and minimum

0f0a4f9

Fixes #27836

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation when reducing SparseMatrixCSC #27836

Performance degradation when reducing SparseMatrixCSC #27836

bicycle1885 commented Jun 28, 2018 •

edited

Loading

ViralBShah commented Jun 28, 2018

bicycle1885 commented Jun 28, 2018 •

edited

Loading

bicycle1885 commented Jun 28, 2018

bicycle1885 commented Jun 28, 2018

ViralBShah commented Jun 28, 2018

andreasnoack commented Jun 28, 2018

bicycle1885 commented Jun 28, 2018

ViralBShah commented Jun 28, 2018

Keno commented Jul 1, 2018

Performance degradation when reducing SparseMatrixCSC #27836

Performance degradation when reducing SparseMatrixCSC #27836

Comments

bicycle1885 commented Jun 28, 2018 • edited Loading

ViralBShah commented Jun 28, 2018

bicycle1885 commented Jun 28, 2018 • edited Loading

bicycle1885 commented Jun 28, 2018

bicycle1885 commented Jun 28, 2018

ViralBShah commented Jun 28, 2018

andreasnoack commented Jun 28, 2018

bicycle1885 commented Jun 28, 2018

ViralBShah commented Jun 28, 2018

Keno commented Jul 1, 2018

bicycle1885 commented Jun 28, 2018 •

edited

Loading

bicycle1885 commented Jun 28, 2018 •

edited

Loading