Replies: 3 comments 2 replies
-
See #957 (comment) (and follow up comments) |
Beta Was this translation helpful? Give feedback.
-
Thank you for the quick reply. I'm not doing string comparison, but intersection between two sorted 16-bit arrays using SSE4.1 intrinsics (see code below). As far as I know, the proposed Sse2.CompareEquals / _mm256_cmpeq_epi16 allows only a pairwise comparison of the elements of vector A and Vecor B (8 comparisons). Any suggestion?
Source: Roaring Bitmaps: Implementation of an Optimized Software Library, Daniel Lemire https://arxiv.org/pdf/1709.07821.pdf |
Beta Was this translation helpful? Give feedback.
-
That's an interesting use case for runtime/src/native/external/rapidjson/reader.h Lines 296 to 305 in 0e499ac In cases like that (small number of characters to match), similar perf can be achieved when substituting multiple compares with constant vectors that have each character broadcast to all elements -- with even better performance when extended to 256-bit vectors with AVX2. For a full intersection of 8x8 values, there won't be an approach that can equal what The only real hold-up to getting those missing SSE4.2 instructions into |
Beta Was this translation helpful? Give feedback.
-
I am looking for the _mm_cmpistrm instruction, available on SSE4.2 processors.
( PCMPISTRM Packed Compare Implicit Length Strings, Return Mask https://www.felixcloutier.com/x86/pcmpistrm )
Can't find it on the Sse42 object...
Beta Was this translation helpful? Give feedback.
All reactions