You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ok I'm having a little trouble following the assembly code (I'll need some time to digest it 😅), but by caching do you mean something like a ring buffer?
Ok I'm having a little trouble following the assembly code (I'll need some time to digest it sweat_smile), but by caching do you mean something like a ring buffer?
No, it's not a ring buffer. There're 31 Z registers in SVE. So use some to them to store the value of secret array.
The logic is that secret array is shared among accumulating and scrambling. SVE costs a lot on accessing I/O. If I could make accessing I/O only once, it could improve performance. And it's proved by the performance result.
@Cyan4973 @easyaspi314
I try to cache a whole secret array in assembly code. And the performance is better. The maximum performance data could read 18xxx. How do you think about this idea? (https://github.com/hzhuang1/xxHash/tree/dirty_v0.2.1)
Of course, current code is still ugly. Let's discuss the idea first.
=== benchmarking 4 hash functions ===
benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27)
xxh3 , 3516, 6130, 9465, 12771, 15576, 17283, 18393, 14563, 13763, 13596, 13689, 13672, 13775, 12043, 5684, 4912, 4834, 4923, 4874
XXH32 , 1338, 1436, 1486, 1521, 1534, 1542, 1546, 1536, 1505, 1507, 1507, 1507, 1508, 1458, 1264, 1206, 1213, 1212, 1208
XXH64 , 2509, 2799, 2973, 3066, 3122, 3102, 3154, 3136, 3057, 3063, 3066, 3066, 3064, 2899, 2197, 2005, 2009, 2007, 1993
XXH128 , 3392, 5864, 9421, 12495, 15354, 17187, 18306, 14565, 13733, 13822, 14105, 14172, 14208, 12229, 5779, 5016, 4898, 4898, 4851
Throughput small inputs of fixed size (from 2 to 2 bytes):
xxh3 , 72821732
The text was updated successfully, but these errors were encountered: