Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NEON encode and check #56

Merged
merged 9 commits into from
Sep 27, 2024
Merged

Add NEON encode and check #56

merged 9 commits into from
Sep 27, 2024

Conversation

Lynnesbian
Copy link
Contributor

@Lynnesbian Lynnesbian commented Sep 23, 2024

Add implementations for hex_encode and hex_check using ARM's NEON (aka AdvSIMD) SIMD instruction set. These implementations are based on the existing SSE4.2 ones - they're more or less direct translations.

These implementations are only active on aarch64 targets and not 32-bit ARM targets (armv7 etc), because NEON intrinsics on 32-bit ARM are unstable.


Unfortunately, checking for NEON support at runtime is a difficult problem to solve. My current implementation is less than ideal:

https://github.com/Lynnesbian/faster-hex/blob/859221bbcfd2256047b5bf6d334f30beb906ee3f/src/lib.rs#L159-L171

I've found a variety of differing ways to get this information on Aarch64 platforms:

There's no nice, cross-platform, no-std method to do this, like there is with x86's cpuid. And worse - many of these methods only work for Aarch64, and not 32-bit ARM platforms.

I decided against including all of these methods in the vectorization_support function. They'd necessitate bringing in multiple new dependencies, and would make testing much more complicated.

@Lynnesbian
Copy link
Contributor Author

Unfortunately, the only Aarch64 device I have access to for benchmarking with is my phone (a Samsung Galaxy A73). My Raspberry Pi 3B seems to have died since I last used it years ago 😢

Here are the relevant benchmark results from running cargo bench under Termux (which is, of course, far from an ideal benchmarking environment):

Bench Result
bench_faster_hex_encode 91.559 ns
bench_faster_hex_encode_fallback 140.00 ns

You can view the full output from cargo bench here.

@eval-exec
Copy link
Collaborator

Thank you, is it possible to run the benchmark on the CI workflow?

@Lynnesbian
Copy link
Contributor Author

Sure, how would I do that? Would I need to add benchmarks to the rust.yml workflow file?

@eval-exec
Copy link
Collaborator

eval-exec commented Sep 23, 2024

Sure, how would I do that? Would I need to add benchmarks to the rust.yml workflow file?

Sure.

@Lynnesbian
Copy link
Contributor Author

Unfortunately it seems like GitHub doesn't have any Aarch64 runners available at this time, but they're aiming for them to be available by the end of the year.

This means there's currently no way to run the CI on an Aarch64 runner, unless you want to set up self-hosted runners.

src/lib.rs Show resolved Hide resolved
@quake quake merged commit 4acf38e into nervosnetwork:master Sep 27, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants