PDX is a vertical data layout for vectors that store the dimensions of different vectors together. In a nutshell, PDX is PAX for vector similarity search.
- ✂️ Efficient pruning of dimensions with partial distance calculations.
- ⚡ Up to 7x faster IVF queries than FAISS+AVX512.
- ⚡ Up to 13x faster exhaustive search thanks to pruning.
- ⚡ Distance kernels in PDX are up to 1.6x faster than the
float32
kernels in USearch and FAISS.- Why? Distance kernels in PDX are free of dependencies and have fewer LOAD/STORE operations.
- Distance kernels can auto-vectorize efficiently without explicit SIMD for
float32
. - Distance kernels on small vectors (
d < 16
) are up to 8x faster than SIMD kernels in SimSIMD. - More efficient compressed representations of vectors (WIP).
Pruning means avoiding checking all the dimensions of a vector to determine if it will make it onto the KNN of a query. Pruning speedup vector similarity search as (i) less data must be fetched and (ii) fewer computations must be done.
However, pruning methods that do partial distance calculations have a hard time being on par with SIMD-optimized kernels like the ones in FAISS and SimSIMD.
Only thanks to the PDX layout, do pruning methods outperform SIMD-optimized kernels in all CPU microarchitectures (Zens, Intels, Gravitons). This is because, in PDX, distance calculations are efficient even in small vector segments and vectors with a low dimensionality. Also, the evaluation of the pruning predicate is not interleaved with distance calculations.
Click here to see some benchmarks of PDX vs FAISS. The complete benchmarks are available in our publication.
Pruning algorithms are especially effective when:
- Vectors are of high dimensionality (
d > 512
) - High recalls are needed (
> 0.90
) - Exact results are needed
k
is relatively low (k=10,20
)
The latest pruning algorithms based on partial distance calculations are ADSampling, DDC (previously named BSA), and DADE. All of these rely on rotating the vector collection to prune effectively. PDX+ADSampling works great in most scenarios.
Alongside PDX, we also introduce PDX-BOND, a simpler and exact pruning algorithm that does not need to rotate the vectors (based on BOND).
You can quickly try PDX with your data using our Python bindings and examples. We have implemented PDX on Flat IVF indexes and exhaustive search settings.
- Python 3.11 or higher
- FAISS with Python Bindings
- Clang++17 or higher
- CMake 3.26 or higher
- Clone the repository and init submodules (
Eigen
for efficient matrix operations andpybind11
)
git clone https://github.com/cwida/PDX
git submodule init
git submodule update
- Install Python dependencies and
pdxearch
Python bindings.
export CXX="/usr/bin/clang++-18" # Set proper CXX first
pip install -r requirements.txt
python setup.py clean --all
python -m pip install .
- Run the examples under
/examples
# Creates an IVF index with FAISS on random data
# Then, it compares the search performance of PDXearch and FAISS
python ./examples/pdxearch_simple.py
For more details on the available examples and how to use your own data, refer to /examples/README.md.
Note
We heavily rely on FAISS to create the underlying IVF indexes.
We present some single-threaded benchmarks from our examples against FAISS for some use cases in which PDX shines. We use r7iz.2xlarge
(Intel SPR) and r8g.2xlarge
(Graviton 4) AWS instances.
PDX paired with the pruning algorithm ADSampling on IVF indexes works great in most scenarios with less than 0.001 recall loss. The higher the dimensionality, the higher the gains from pruning. The following benchmarks are from /examples/pdxearch_ivf.py.
Avg. query time [Intel SPR | r7iz.2x] |
FAISS AVX512 R@10: 0.99 · 0.95 · 0.90 |
PDXearch | Improvement |
---|---|---|---|
DBPedia · d=1536 · 1M | 53.6 · 18.0 · 7.7 ms | 7.4 · 3.7 · 2.4 ms | 7.2x · 4.9x · 3.2x |
arXiv · d=768 · 2.25M | 25.9 · 7.7 · 3.5 | 6.2 · 2.8 · 1.7 | 4.2x · 2.8x · 2.1x |
SIFT · d=128 · 1M | 1.7 · 0.6 · 0.4 | 1.0 · 0.4 · 0.3 | 1.7x · 1.5x · 1.3x |
Avg. query time [Graviton 4 | r8g.2x] |
FAISS SVE R@10: 0.99 · 0.95 · 0.90 |
PDXearch | Improvement |
---|---|---|---|
DBPedia · d=1536 · 1M | 47.1 · 18.4 · 6.7 ms | 7.1 · 4.1 · 2.5 ms | 6.6x · 4.5x · 2.7x |
arXiv · d=768 · 2.25M | 25.3 · 7.0 · 3.2 | 5.9 · 2.7 · 1.7 | 4.3x · 2.6x · 1.9x |
SIFT · d=128 · 1M | 1.1 · 0.5 · 0.3 | 0.7 · 0.4 · 0.2 | 1.6x · 1.3x · 1.3x |
Note
On these benchmarks: (i) Both FAISS and PDXearch are scanning exactly the same vectors. (ii) The recall loss of ADSampling is always less than 0.001.
In PDX, building an IVF index can significantly improve exact search speed (thanks to the reliable pruning), even if the clustering is not super optimized. The following benchmarks are from /examples/pdxearch_ivf_exhaustive.py.
Avg. query time [Intel SPR | r7iz.2x] |
FAISS AVX512 |
PDXearch | Improvement |
---|---|---|---|
DBPedia · d=1536 · 1M | 411.7 ms | 34.9 ms | 11.8x |
GIST · d=960 · 1M | 252.9 | 27.6 | 9.1x |
arXiv · d=768 · 2.25M | 454.5 | 41.4 | 11.0x |
SIFT · d=128 · 1M | 34.4 | 6.5 | 5.3x |
Avg. query time [Graviton 4 | r8g.2x] |
FAISS SVE |
PDXearch | Improvement |
---|---|---|---|
DBPedia · d=1536 · 1M | 277.2 ms | 21.7 ms | 12.8x |
GIST · d=960 · 1M | 170.9 | 19.4 | 8.8x |
arXiv · d=768 · 2.25M | 306.0 | 27.2 | 11.3x |
SIFT · d=128 · 1M | 21.3 | 4.7 | 4.5x |
Use PDX+BOND, our pruning algorithm. Here, vectors are not transformed, and we do not use any additional index. Gains vary depending on the dimensions distribution. The following benchmarks are from /examples/pdxearch_exact_bond.py.
Avg. query time [Intel SPR | r7iz.2x] |
FAISS AVX512 |
PDXearch | Improvement |
---|---|---|---|
DBPedia · d=1536 · 1M | 374 ms | 216 ms | 1.7x |
arXiv · d=768 · 2.25M | 422 | 212 | 2.0x |
MSong · d = 420 · 1M | 117 | 15 | 7.8x |
SIFT · d=128 · 1M | 31 | 10 | 3.1x |
Avg. query time [Graviton 4 | r8g.2x] |
FAISS SVE |
PDXearch | Improvement |
---|---|---|---|
DBPedia · d=1536 · 1M | 278 ms | 139 ms | 2.0x |
arXiv · d=768 · 2.25M | 305 | 155 | 2.0x |
MSong · d = 420 · 1M | 70 | 15 | 4.7x |
SIFT · d=128 · 1M | 22 | 11 | 2.0x |
PDX distance kernels are also faster than the state-of-the-art SIMD kernels in all major architectures, only relying on auto-vectorization (for float32
). For a detailed explanation of how these distance kernels work, check Figure 3 of our publication. The following benchmarks are from /examples/pdx_brute.py.
Avg. query time [Intel SPR | r7iz.2x] |
USearch | FAISS AVX512 |
PDXearch | Improvement |
---|---|---|---|---|
GIST · d=960 · n=1M | 260 ms | 247 ms | 208 ms | 1.3x · 1.2x |
STL · d=9216 · n=90K | 223 | 220 | 176 | 1.3x · 1.3x |
Avg. query time [Graviton 4 | r8g.2x] |
USearch | FAISS SVE |
PDXearch | Improvement |
---|---|---|---|---|
GIST · d=960 · n=1M | 203 ms | 172 ms | 124 ms | 1.6x · 1.4x |
STL · d=9216 · n=90K | 175 | 160 | 109 | 1.6x · 1.5x |
Note
Refer to our publication for the complete benchmarks of PDXearch done in 4 microarchitectures: Intel SPR, Zen 4, Zen 3, and Graviton 4. Here, you will also find a comparison against ADSampling in the horizontal layout (.fvecs
layout). Furthermore, you will find details on how we adapted pruning algorithms to work in PDX.
- Compression: The vertical layout opens opportunities for compression as indexing algorithms group together vectors that share some numerical similarity within their dimensions. The next step on PDX is to apply our scalar quantization algorithm LEP, which uses database compression techniques (ALP) to compress vectors with higher compression ratios and less information loss.
- More data types: For compressed vectors, we need to implement vertical distance kernels on vectors of variable bit size.
- PDX in HNSW: For this, we need a layout similar to the ones proposed in Starling or AiSAQ, in which neighborhoods of the graph are stored and fetched in blocks.
- Improve code readability and usability.
- Add a testing framework.
- Add DDC or DADE algorithms to the Python Bindings.
- Implement multi-threading capabilities.
Important
PDX is an ongoing research project. In its current state, it is not production-ready code.
To run our benchmark suite in C++, refer to BENCHMARKING.md.
The code used for the experiments presented at SIGMOD'25 can be found in the sigmod
branch.