SparseArrays: `mul!(W, X', V)` much slower than `mul!(V, X, W)` for `Float32` entries

I'm seeing surprisingly low performance for `mul!(W, X', V)` when `X` is a `SparseMatrixCSC` with `Float64` entries (`X=sprand(2504, 100000, 0.05)`) and `W` and `V` are dense matrices with `Float32` entries. This operation takes about an order of magnitude longer than the same operation when  `W` and `V` are dense matrices with `Float64` entries.  However, if `X` has `Bool` entries (`X=sprand(Bool, 2504, 100000, 0.05)`) I don't see any performance difference between `Float64` and `Float32` entries for `V` and `W`.

```julia
> using SparseArrays, LinearAlgebra

# Float64 X, V and W
> X = sprand(2504, 100000, 0.05); V = randn(2504, 3); W = zeros(100000, 3);

## transposed X
> @time mul!(W, X', V);
0.082900 seconds (80.11 k allocations: 4.399 MiB)
> @time mul!(W, X', V);
0.033305 seconds (1 allocation: 48 bytes)

## non-transposed X
> @time mul!(V, X, W);
0.122455 seconds (46.03 k allocations: 2.414 MiB)
> @time mul!(V, X, W);
0.088595 seconds

# Float64 X, Float32 V and W
> X = sprand(2504, 100000, 0.05); V = randn(Float32, 2504, 3); W = zeros(Float32, 100000, 3);

## transposed X
> @time mul!(W, X', V);
0.369262 seconds (77.55 k allocations: 4.190 MiB)
> @time mul!(W, X', V);
0.324316 seconds (1 allocation: 48 bytes) # about 10x slower than the same operation with Float64 entries

## non-transposed X
> @time mul!(V, X, W);
 0.123769 seconds (46.30 k allocations: 2.425 MiB)
> @time mul!(V, X, W);
0.087341 seconds # about the same performance as the same operation with Float64 entries
```


```julia
> versioninfo()
Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 7 1700X Eight-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver1)
> LinearAlgebra.versioninfo()
BLAS: libopenblas (OpenBLAS 0.3.9  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Zen MAX_THREADS=32)
LAPACK: libopenblas64_
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

SparseArrays: `mul!(W, X', V)` much slower than `mul!(V, X, W)` for `Float32` entries #822

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

SparseArrays: mul!(W, X', V) much slower than mul!(V, X, W) for Float32 entries #822

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

SparseArrays: `mul!(W, X', V)` much slower than `mul!(V, X, W)` for `Float32` entries #822