Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation when reducing SparseMatrixCSC #27836

Closed
bicycle1885 opened this issue Jun 28, 2018 · 9 comments
Closed

Performance degradation when reducing SparseMatrixCSC #27836

bicycle1885 opened this issue Jun 28, 2018 · 9 comments
Labels
performance Must go faster potential benchmark Could make a good benchmark in BaseBenchmarks regression Regression in behavior compared to a previous version sparse Sparse arrays

Comments

@bicycle1885
Copy link
Member

bicycle1885 commented Jun 28, 2018

I've found an extreme performance degradation (~30x slower) on Julia 0.7.0-beta vs. on Julia 0.6.2 when reducing a large SparseMatrixCSC matrix using maximum. I couldn't find differences in the data, so I think this is a problem in SparseArrays.

Julia 0.6.2

julia> size(counts)
(27998, 1306127)

julia> typeof(counts)
SparseMatrixCSC{Int32,Int64}

julia> @time maximum(counts, 2);
  9.171888 seconds (103.28 k allocations: 5.739 MiB, 0.06% gc time)

julia> @time sum(counts .> 0)
 10.630171 seconds (42.21 k allocations: 22.013 GiB, 0.08% gc time)
2624828308

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

Julia 0.7.0-beta:

julia> size(counts)
(27998, 1306127)

julia> typeof(counts)
SparseMatrixCSC{Int32,Int64}

julia> @time maximum(counts, dims=2);
271.014620 seconds (5.28 M allocations: 34.628 GiB, 0.09% gc time)

julia> @time sum(counts .> 0)
 10.284461 seconds (348.48 k allocations: 22.028 GiB, 0.69% gc time)
2624828308

julia> versioninfo()
Julia Version 0.7.0-beta.0
Commit f41b1ecaec (2018-06-24 01:32 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
@ViralBShah
Copy link
Member

Wow that is a lot of allocation for a simple operation. Is there an easy way to create the input sparse matrix?

Probably you want to work with the transpose - but that is a different matter.

@ViralBShah ViralBShah added the sparse Sparse arrays label Jun 28, 2018
@bicycle1885
Copy link
Member Author

bicycle1885 commented Jun 28, 2018

I made this matrix from an HDF5 file downloaded from https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons (the "aggr - Gene / cell matrix HDF5 (filtered)" file, 3.93 GB). The following script can load the file into a SparseMatrixCSC matrix:

using SparseArrays
using HDF5
function load_10x_format(data)
    featurenames = read(data["genes"])
    cellnames = read(data["barcodes"])
    m, n = read(data["shape"])
    counts = SparseMatrixCSC(
        m, n,
        read(data["indptr"]).+1,
        read(data["indices"]).+1,
        convert(Vector{Int32}, read(data["data"])))
    SparseArrays.sortSparseMatrixCSC!(counts)
    return featurenames, cellnames, counts
end
featurenames, cellnames, counts = HDF5.h5open(file->load_10x_format(file["/mm10"]), "1M_neurons.h5")

It requires ~30 GiB of RAM to load but it may be reproducible with smaller matrices.

@bicycle1885
Copy link
Member Author

From the following profiling, the problem is around these lines: https://github.com/JuliaLang/julia/blob/v0.7.0-beta/stdlib/SparseArrays/src/sparsematrix.jl#L1689-L1692.

julia> Profile.init(delay=0.1)

julia> @profile maximum(counts, dims=2);

julia> Profile.print()
2723 ./task.jl:257; (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})()
 2723 ...sr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:116; macro expansion
  2723 ...sr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:85; eval_user_input(::Any, ::REPL.REPLBackend)
   2723 ./boot.jl:319; eval(::Module, ::Any)
    2723 ./<missing>:0; top-level scope
     2723 ...e/julia/stdlib/v0.7/Profile/src/Profile.jl:27; top-level scope
      2723 ./none:0; (::getfield(Base, Symbol("#kw##maximum")))(::NamedTuple{(:dims,)...
       2723 ./reducedim.jl:628; #maximum#544
        2723 ./reducedim.jl:654; _maximum
         2723 ./reducedim.jl:655; _maximum
          2723 ./none:0; #mapreduce
           2723 ./reducedim.jl:276; #mapreduce#537
            2723 ./reducedim.jl:285; _mapreduce_dim(::Function, ::Function, ::SparseMatrixCSC{Int...
             2595 ./reducedim.jl:244; mapreducedim!(::Function, ::Function, ::SparseMatrixCSC{Int...
              2595 ...v0.7/SparseArrays/src/sparsematrix.jl:1713; _mapreducedim!(::typeof(identity), ::typeof(max), ::Sparse...
               1    ./range.jl:0; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               18   ./sort.jl:0; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               369  ./sort.jl:179; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               2161 ...v0.7/SparseArrays/src/sparsematrix.jl:1689; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
                2161 ./simdloop.jl:73; macro expansion
                 2    ./array.jl:707; getindex
                 15   ./array.jl:707; macro expansion
                 2144 ...v0.7/SparseArrays/src/sparsematrix.jl:84; macro expansion
                  38  ./array.jl:707; getindex
                  130 ./array.jl:745; setindex!
                  71  ./int.jl:53; getindex
                  17  ./int.jl:0; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                  11  ./promotion.jl:436; max
                   3 ./int.jl:49; <
                  38  ./sort.jl:0; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                  65  ./sort.jl:179; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                  101 ./sysimg.jl:18; getindex
                  15  ...0.7/SparseArrays/src/sparsematrix.jl:1910; getindex
                   15 ./array.jl:707; getindex
                  5   ...0.7/SparseArrays/src/sparsematrix.jl:1912; getindex
                  461 ...0.7/SparseArrays/src/sparsematrix.jl:1913; getindex
                   14  ./int.jl:0; searchsortedfirst
                   153 ./sort.jl:179; searchsortedfirst
                    153 ./int.jl:52; -
                   45  ./sort.jl:180; searchsortedfirst
                    45 ./int.jl:452; >>>
                     45 ./int.jl:444; >>>
                   240 ./sort.jl:181; searchsortedfirst
                    240 ./ordering.jl:49; lt
                     240 ./operators.jl:338; isless
                      240 ./int.jl:49; <
                   9   ./sysimg.jl:18; getproperty
                  142 ...0.7/SparseArrays/src/sparsematrix.jl:1914; getindex
                   78 ./array.jl:707; getindex
                   25 ./sysimg.jl:18; getproperty
                  28  ...0.7/SparseArrays/src/sparsematrix.jl:2330; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                  32  ...0.7/SparseArrays/src/sparsematrix.jl:2333; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   31 ./int.jl:428; <=
                  16  ...0.7/SparseArrays/src/sparsematrix.jl:2336; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   16 ./array.jl:707; getindex
                  839 ...0.7/SparseArrays/src/sparsematrix.jl:2338; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   204 ./int.jl:49; searchsortedfirst
                   14  ./sort.jl:0; searchsortedfirst
                   406 ./sort.jl:179; searchsortedfirst
                    125 ./int.jl:52; -
                   215 ./sort.jl:180; searchsortedfirst
                    65  ./int.jl:53; +
                    39  ./int.jl:452; >>>
                     39 ./int.jl:444; >>>
                    111 ./sysimg.jl:18; +
                  88  ...0.7/SparseArrays/src/sparsematrix.jl:2339; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   38 ./array.jl:707; getindex
                   18 ./sysimg.jl:18; getproperty
                  23  ...0.7/SparseArrays/src/sparsematrix.jl:2341; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                   23 ./array.jl:745; setindex!
                  13  ...0.7/SparseArrays/src/sparsematrix.jl:2342; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
               7    ...v0.7/SparseArrays/src/sparsematrix.jl:1909; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               32   ...v0.7/SparseArrays/src/sparsematrix.jl:1914; _mapreducecols!(::typeof(identity), ::typeof(max), ::Spar...
               7    ...v0.7/SparseArrays/src/sparsematrix.jl:2330; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int6...
             128  ./reducedim.jl:127; reducedim_init(::Function, ::typeof(max), ::SparseMatrixCSC...
              128 ./reducedim.jl:94; reducedim_initarray0(::SparseMatrixCSC{Int32,Int64}, ::Int64...
               128 ./reducedim.jl:203; mapfirst!(::Function, ::SparseMatrixCSC{Int32,Int64}, ::Sp...
                128 ./abstractarray.jl:1958; map!(::typeof(identity), ::SparseMatrixCSC{Int32,Int64}, ...
                 128 ./abstractarray.jl:997; setindex!
                  128 ./abstractarray.jl:1027; _setindex!
                   84 ...0.7/SparseArrays/src/sparsematrix.jl:2347; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                    84 ./array.jl:1127; insert!
                     84 ./array.jl:791; _growat!
                   44 ...0.7/SparseArrays/src/sparsematrix.jl:2348; setindex!(::SparseMatrixCSC{Int32,Int64}, ::Int32, ::Int...
                    44 ./array.jl:1127; insert!
                     44 ./array.jl:791; _growat!

@bicycle1885
Copy link
Member Author

I didn't know but the reduction operation now returns a sparse matrix for a given sparse matrix, which causes the performance degradation in the tight loop of reduction. This change may be reasonable but the performance degradation is not acceptable to me.

julia> typeof(maximum(counts, 2))  # Julia 0.6.2
Array{Int32,2}

julia> typeof(maximum(counts, dims=2))  # Julia 0.7.0-beta
SparseMatrixCSC{Int32,Int64}

@ViralBShah
Copy link
Member

I agree the performance is not acceptable.

@fredrikekre fredrikekre added performance Must go faster regression Regression in behavior compared to a previous version labels Jun 28, 2018
@andreasnoack
Copy link
Member

This is an unintended consequence of #23227.

@bicycle1885
Copy link
Member Author

Thank you, @andreasnoack! Let me know if I need to benchmark your fix.

@ViralBShah
Copy link
Member

It appears that even in the "before" version, things are too slow - based on the amount of allocation. This operation should literally be possible with a single scan of the data structure.

@Keno
Copy link
Member

Keno commented Jul 1, 2018

Ideally add the benchmark to BaseBenchmarks, so it's tracked.

andreasnoack added a commit that referenced this issue Jul 13, 2018
* Adjust initialization in maximum and minimum

Fixes #27836

* Reduce to first element in index range for OffsetArrays

* Adjust mapreduce to use first element in offset range as well.

Adjust tests

* Avoid defining global Areduc variable in reducedim tests

* Add comment in mapslices to explain the wrapping of scalars
@KristofferC KristofferC added the potential benchmark Could make a good benchmark in BaseBenchmarks label Jul 13, 2018
andreasnoack added a commit that referenced this issue Jul 16, 2018
* Adjust initialization in maximum and minimum

Fixes #27836

* Reduce to first element in index range for OffsetArrays

* Adjust mapreduce to use first element in offset range as well.

Adjust tests

* Avoid defining global Areduc variable in reducedim tests

* Add comment in mapslices to explain the wrapping of scalars

* Change reduced_index for Slices and add test for previously failing
maximum of OffsetArray example

* Add checks that reductions along dimensions are inferred
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster potential benchmark Could make a good benchmark in BaseBenchmarks regression Regression in behavior compared to a previous version sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

6 participants