The best practice for sparse matrix multiplication #6

JiaweiZhuang · 2017-11-03T04:20:43Z

The information below is terribly outdated. See the latest comparisons at: https://github.com/JiaweiZhuang/sparse_dot

Since #2 is largely solved, I've done some preliminary tests on sparse matrix multiplication (SMM), ~~using the sparse package.~~ Just realized the basic scipy.sparse.coo_matrix.dot() is enough. I can flatten the input array to shape [Ntime*Nlev, Nlat*Nlon] so it can be passed to coo_matrix.dot().

The test input data has the shape (600, 400, 10, 50), which is then regridded to shape (400, 300, 10, 50). The timing results are

Calculating weights: ~14 s
(This only needs to be done once, so I am not trying to optimize it)
- Applying weights with ESMPy: 1.46 s
~~(transferring numpy data to ESMPy internal takes additional 6.6s, due to converting C-ordered array to Fortran-ordered)~~
- Applying weights with sparse.tensordot(): 579 ms

~~It is quite surprising that sparse.tensordot() is 3 times faster than ESMPy's internal Fortran routine.~~

Applying weights: (The previous comparison was not fair. I've removed all unnecessary array copying and re-ordering, and here's the new timing)

Method	Applying weights	Write output data to file
ESMPy	0.3 s ^[1] + 1 s ^[2]	2 s ^[3]
coo_matrix.dot	0.6 s	2 s
sparse.tensordot ^[4]	1 s	4 s
loop with numba	3 s	2 s
loop with pure numpy	6 s	2 s

[1] : Passing input data to Fortran internal. It also doubles memory use, which is not a problem in other methods.
[2] : Actual SMM.
[3] : Slow, due to the use of Docker. Would be ~0.3 s on native machine.
[4] : The slow-down compared to scipy is due to memory layout (C-ordering vs Fortran-ordering). See the notebook below.

The notebooks are available at

compute_weights.ipynb: offline weights generation and ESMPy SMM timing
apply_weights.ipynb: Python SMM timing

It might be useful to have both ESMPy and scipy as different backends. ESMPy results can be used as reference. At least one non-ESMPy method is needed since ESMPy cannot reuse offline weights right now.

@bekozi If you have time (no hurry), could you run my notebooks to see whether it is true that ESMPy's SMM is slower than Python's? I want to make sure the comparison was fair and I was not using ESMPy in a wrong way.

This thread is for discussing the best practice for SMM (i.e. applying weights).

Before developing high-level APIs I want to make sure I am using the most efficient way to perform SMM. Any suggestions will be appreciated. @mrocklin @pelson

The text was updated successfully, but these errors were encountered:

JiaweiZhuang · 2017-11-03T04:21:46Z

Because the regridding is so fast even for this relatively large data, I will postpone #3 and focus on serial implementation first.

mrocklin · 2017-11-20T03:38:39Z

It is quite surprising that sparse.tensordot() is 3 times faster than ESMPy's internal Fortran routine.

sparse.tensordot just calls scipy.sparse.csr_matrix.dot, which is written in C. It is not surprising to see it perform decently. There is some cost in converting from NDArray COO format to 2d CSR, but hopefully this is relatively small. It is also cached if used repeatedly.

mrocklin · 2017-11-20T03:43:11Z

Is it necessary for ESMPy to write outputs to file or can it pass results back as numpy arrays?

JiaweiZhuang · 2017-11-20T04:00:12Z

Is it necessary for ESMPy to write outputs to file or can it pass results back as numpy arrays?

Currently it has to write outputs to file. I don't know ESMPy internals so need to ask @bekozi

JiaweiZhuang · 2017-11-30T06:35:03Z

According to the timing tests, scipy.sparse.coo_matrix.dot should be the most efficient method for applying weights.

This commit addresses a couple of issues. First, cartopy has been used to plot the data on maps in lieu of basemap. This should be more relevant to people going forward. I will keep around a branch that contains the basemap version of the notebook, but not updates will occur on that branch barring any unforeseen errors. Secondly, per #4, it was pointed out that passing Fortran memory order arrays instead of C memory order arrays is faster, see the issue from JiaweiZhuang/xESMF#6 for further details. Care was taken to make sure code logic is still intuitive. Closes #3 Closes #4

This was referenced Nov 3, 2017

Retrieve regridding weights as numpy arrays #2

Closed

Parallel regridding support #3

Closed

Regridding API design #9

Closed

JiaweiZhuang closed this as completed Nov 30, 2017

JiaweiZhuang mentioned this issue Dec 1, 2017

Use more efficient memory layout nawendt/esmpy-tutorial#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The best practice for sparse matrix multiplication #6

The best practice for sparse matrix multiplication #6

JiaweiZhuang commented Nov 3, 2017 •

edited

Loading

JiaweiZhuang commented Nov 3, 2017 •

edited

Loading

mrocklin commented Nov 20, 2017

mrocklin commented Nov 20, 2017

JiaweiZhuang commented Nov 20, 2017

JiaweiZhuang commented Nov 30, 2017 •

edited

Loading

The best practice for sparse matrix multiplication #6

The best practice for sparse matrix multiplication #6

Comments

JiaweiZhuang commented Nov 3, 2017 • edited Loading

JiaweiZhuang commented Nov 3, 2017 • edited Loading

mrocklin commented Nov 20, 2017

mrocklin commented Nov 20, 2017

JiaweiZhuang commented Nov 20, 2017

JiaweiZhuang commented Nov 30, 2017 • edited Loading

JiaweiZhuang commented Nov 3, 2017 •

edited

Loading

JiaweiZhuang commented Nov 3, 2017 •

edited

Loading

JiaweiZhuang commented Nov 30, 2017 •

edited

Loading