Why so much faster than sparse matrix multiplication? #2209

mjudell · 2023-09-24T19:11:04Z

mjudell
Sep 24, 2023

I have not done a good job researching this question.

Why is Anserini so much faster than sparse matrix multiplication? BT-SPLADE with Anserni has an MS-MARCO query latency of ~10ms.

The retrieval task can be formulated as a sparse matrix-vector product size (9million, 30thousand) x (30thousand, 1). The snippet below takes 300,000x longer to compute.

How is Anserini so fast?

import scipy.sparse as sparse
import time

actual_n = int(9e9)
sample_n = int(1e4)
dimension = 30000
sparsity = 0.005

A = sparse.random(sample_n, dimension, sparsity, format='csr')
b = sparse.random(dimension, 1, sparsity, format='csr')

start = time.time()
A.dot(b)
print((time.time() - start) * 1000 * actual_n / sample_n)

Answered by cadurosar

Sep 24, 2023

Basically, Anserini uses LUCENE which uses a dynamic pruning technique called BLOCK-Max Wand: https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand . There's quite a bit of research in how to better search with inverted indexes.

Note that the 10ms numbers for efficient SPLADE are with PISA and not Anserini. Anserini numbers are in Figure 7 of the appendix and are around 40 ms.

View full answer

cadurosar · 2023-09-24T19:14:25Z

cadurosar
Sep 24, 2023
Collaborator

Basically, Anserini uses LUCENE which uses a dynamic pruning technique called BLOCK-Max Wand: https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand . There's quite a bit of research in how to better search with inverted indexes.

Note that the 10ms numbers for efficient SPLADE are with PISA and not Anserini. Anserini numbers are in Figure 7 of the appendix and are around 40 ms.

0 replies

cadurosar · 2023-09-24T20:04:02Z

cadurosar
Sep 24, 2023
Collaborator

Also a note on the sparsity there, 0.005 is way too high for queries. On Efficient-SPLADE small it was around 6 unique tokens per query from what I remember, which gives you more of a 0.0002 sparsity.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why so much faster than sparse matrix multiplication? #2209

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Why so much faster than sparse matrix multiplication? #2209

mjudell Sep 24, 2023

Replies: 2 comments

cadurosar Sep 24, 2023 Collaborator

cadurosar Sep 24, 2023 Collaborator

mjudell
Sep 24, 2023

cadurosar
Sep 24, 2023
Collaborator

cadurosar
Sep 24, 2023
Collaborator