-
I have not done a good job researching this question. Why is Anserini so much faster than sparse matrix multiplication? BT-SPLADE with Anserni has an MS-MARCO query latency of ~10ms. The retrieval task can be formulated as a sparse matrix-vector product size (9million, 30thousand) x (30thousand, 1). The snippet below takes 300,000x longer to compute. How is Anserini so fast? import scipy.sparse as sparse
import time
actual_n = int(9e9)
sample_n = int(1e4)
dimension = 30000
sparsity = 0.005
A = sparse.random(sample_n, dimension, sparsity, format='csr')
b = sparse.random(dimension, 1, sparsity, format='csr')
start = time.time()
A.dot(b)
print((time.time() - start) * 1000 * actual_n / sample_n) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Basically, Anserini uses LUCENE which uses a dynamic pruning technique called BLOCK-Max Wand: https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand . There's quite a bit of research in how to better search with inverted indexes. Note that the 10ms numbers for efficient SPLADE are with PISA and not Anserini. Anserini numbers are in Figure 7 of the appendix and are around 40 ms. |
Beta Was this translation helpful? Give feedback.
-
Also a note on the sparsity there, 0.005 is way too high for queries. On Efficient-SPLADE small it was around 6 unique tokens per query from what I remember, which gives you more of a 0.0002 sparsity. |
Beta Was this translation helpful? Give feedback.
Basically, Anserini uses LUCENE which uses a dynamic pruning technique called BLOCK-Max Wand: https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand . There's quite a bit of research in how to better search with inverted indexes.
Note that the 10ms numbers for efficient SPLADE are with PISA and not Anserini. Anserini numbers are in Figure 7 of the appendix and are around 40 ms.