Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide hardware-accelerated assembly versions #4

Open
joshtriplett opened this issue Sep 20, 2016 · 2 comments
Open

Provide hardware-accelerated assembly versions #4

joshtriplett opened this issue Sep 20, 2016 · 2 comments

Comments

@joshtriplett
Copy link

Various architectures provide native instructions that compute this operation, or parts of it. For instance, there exist instructions to convert or lookup many bytes/words/dwords at once as a vectorized operation. http://stackoverflow.com/questions/746171/best-algorithm-for-bit-reversal-from-msb-lsb-to-lsb-msb-in-c#24058332 provides one approach, which uses a vector register directly as a lookup table for nibbles.

@EugeneGonzalez
Copy link
Owner

I'm glad you posted this issue. This is in the future plans of the library after I finish more basic issues. Interesting I don't know of any ISA that has a bit reversal instruction, if you could point me towards one that would great.

The main problem with implementing the StackOverflow algorithm is the general lack of stability in SIMD instruction in Rust. There is a SIMD crate, but it doesn't support all the SIMD architectures and hasn't seen progress in awhile. There is a LLVM Intrinsics crate, but that is diving off the deep end in terms of compatibility and maintainability. Lastly, there is this current discussion on SIMDs in Rust/rfcs that might provide more context to the problem.

So while I could drop down to ASM to implement the enhanced vectored algorithm, I would like to wait until Rust has support to do it itself. That said this is still a running goal and when the opportunity arises, it will be addressed. If you submit a PR for this, I would be glad to accept it.

@joshtriplett
Copy link
Author

I don't know of an architecture that directly has such an instruction, just architectures that can do it in a small handful of instructions.

I agree that ideally Rust should handle this without requiring inline assembly or LLVM intrinsics. I don't know how much it would take to make that work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants