SIMD optimization #171

arkpar · 2023-01-23T12:51:19Z

IndexTable::find_entry could benefit from SIMD optimizations for parallel entry search.

x86_64 (Use SSE2 SIMD in IndexTable::find_entry #176)
aarch64

The text was updated successfully, but these errors were encountered:

Tpt · 2023-01-30T19:46:40Z

I just started to investigated it quickly. The code generated by rustc -O is already quite good. The inner loop of the function mostly does bound checking on chunk to extract the entry, shift on the entry value to get the key and basic comparison to check if the entry has not already been found. I have pushed #175 to move the bound check out of the loop.

About SMID, the AVX instructions for unsigned 64 bits integers (_mm_XXX_epu64 instructions) are still "experimental" in Rust so it seems that there is no proper easy way to implement SIMD using unsigned instructions at the moment on Rust stable. There is maybe something doable with assembly or using masks on signed instructions (I am not sure, I am not very familiar with SIMD).

arkpar · 2023-01-30T23:57:38Z

I think we can get away with 32 bit instructions. We only really need to check the first 32-bit word of each 64-bit entry for partial key. 16 bit index entries contain 34 bits of key and 17 bit have 33, but I think we can simply ignore the two last bits and just do full key comparison in case the first 32 bits match. I don't have that much experience with SIMD myself, but I suppose it is possible to load and pack resiter with every second word. Or run the search on all 32 bit words and then filter out every odd result.

hashbrown uses _mm_cmpeq_epi8 so I guess we could also use signed _mm_cmpeq_epi32. It should not matter for equality comparison.

So the whole thing should be:

Build target with _mm_set1_epi32

Then, for each 256 bits:

Load every second 32-bit word into a register
Zero out n lower bits that should not be compared. (n = index_bits - 18 if I'm not mistaken). Alternatively, right shift by n
Compare with target using _mm_cmpeq_epi32 to build a result bitmap
Return the leftmost bit in the result, if any.

arkpar added the enhancement New feature or request label Jan 23, 2023

hanabi1224 mentioned this issue Feb 4, 2023

feat: find_entry simd optimization #179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD optimization #171

SIMD optimization #171

arkpar commented Jan 23, 2023 •

edited

Loading

Tpt commented Jan 30, 2023

arkpar commented Jan 30, 2023

SIMD optimization #171

SIMD optimization #171

Comments

arkpar commented Jan 23, 2023 • edited Loading

Tpt commented Jan 30, 2023

arkpar commented Jan 30, 2023

arkpar commented Jan 23, 2023 •

edited

Loading