-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement an AVX2 accelerated Engine #170
base: master
Are you sure you want to change the base?
Conversation
This sort of thing is exactly what I made I'm terribly busy so who knows when I'll have time to give this a close look, but it's definitely something I want to support. |
Oh, and this might be the sort of thing that would be worth doing #98 for. |
So, with 7039695 I have a first version I'd be okay with being merged¹. I finished implementing the url-safe alphabet, and I am reasonable certain that the code is correct in the sense that it's well behaved to memory, in the sense that it produces correct input data and output data, and in the sense that it's decent enough looking and well behaved enough in company that I wouldn't be ashamed to be seen at it's side on a party. Additionally I skewed the ratio of lines-of-indecipherable-simd-intrinsics to comments-explaining-the-vodoo somewhat in my favour. That is, future me's favour, for when I'm sitting in front of it again in 5 years time and have to figure our what it's doing and why it has stopped working. ¹: I'd sincerely prefer to first squash the history into something more reasonable though. |
Thanks. It'll take me at least a couple weeks to get to this but I will get there! |
No worries, the world has made it this far without AVX2 accelerated base64 in Rust, I think it'll limp along at least a few more weeks :) I'm also still working on this PR here and there. The comments aren't up to the level of neither quality nor quantity I'd like them to be on, and the entire thing is plain and simple not fast enough yet. So no worries and take your time. Neither the code nor me is going anywhere fast and the worst that could happen is that you'll find a more polished, more refined PR that what's here now. ^^ |
The paper this algorithm is taken from can be found via DOI 10.1145/3132709 arXiv:1704.00605 or via arXiv.org: https://arxiv.org/abs/1704.00605
This adds the AVX2Encoder engine to the existing benchmarks and tests and introduces three new fuzzing targets: - roundtrip_avx - decode_random_avx which behave very similar to the non-avx versions, and - roundtrip_avx_equivalent Which checks that the avx2 engine and the DEFAULT_ENGINE produce the same data both ways.
Hello! I recently saw that you were preparing for a new release of base64 and that got me motivated to clean up the PR a bit. No worries if you don't have time to review the PR just yet, I haven't found the time to really sit down and tune the engine either. EDIT: Okay it's |
If the CircleCI runner allows us to use AVX2 extensions this will use them readily, if it doesn't the fuzz_target! will compile but do absolutely nothing.
Thanks for maintaining this. I will get to it eventually... tough to find a big block of uninterrupted, unallocated time. |
Hey no worries! ^^ Not having AVX2-accelerated base64 in Rust isn't going to make the world go up in flames. |
I wanted to wait discussing that until later, but I think it's a good idea to finalize this PR into an integrated package instead of a more fleshed out PoC, so: How do we want to handle engine selection?I think an user enabling an However, that only works for users that also compile locally and with Rust brings the macro I would like to see runtime detection for the shorthand methods like If you think that would be a sensible addition I can either add it to this PR or open a new one only adding the detection system that could be more easily merged before an upcoming |
To make it easier to review in my fragments of OSS time, let's do the feature detection separately. I'm of two minds about the nature of Along those lines, what if we expressly catered to the two (or more) priorities by steering users accordingly? Maybe we should have a |
For that matter I don't like the engine name |
Sounds good. Do you want to have that PR available now or should I wait working on that after we have merged this one?
I agree. Rust caters to many different groups, especially to both performance-sensitive and safety-sensitive at the same time. And I think this crate is fundamental enough that we should keep all of those groups in our heads.
I think we want to explicitly use a slow engine so at least the "Fast" in Especially with portable-simd the "fastest engine" could be written without any *Terms and conditions apply :P |
Let's hold off on feature detection for now so I don't build up too embarrassingly large a PR backlog. I think a "portable engine" and "fastest available" engine makes sense, though it does have a wrinkle: unlike the portable engine, which is a known type and could be used directly by the developer, the fastest engine would have to be constructed via an intervening function to handle resolving the correct CPU features, etc. To make constructing an unknown engine viable, should we force engines to be constructable from an alphabet and a config? If so, then the config type would need to be shared across engines. That's simpler for users, but perhaps harder for engines. Did supporting optional padding or lenient encoding feel like an undue burden? |
@@ -4,6 +4,13 @@ use crate::{alphabet, DecodeError}; | |||
|
|||
pub mod fast_portable; | |||
|
|||
#[cfg(all( | |||
any(target_arch = "x86", target_arch = "x86_64"), | |||
target_feature = "avx2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying a strict target_feature
here might prevent library users from doing dynamic cpu feature detection.
It would probably be better to tell the compiler that it's OK to generate avx2 instructions here using:
#[target_feature(enable = "avx2")]
What is this about?
This PR adds a new
Engine
implementation based on the Wojciech Muła & Daniel Lemire 2017, that makes use of the AVX2 extensions on modern x86 and AMD64/x86_64 CPUs.It's currently on the order of 200-300% faster than the
fast_portable
Engine on Intel Skylake and AMD Zen 2. However I've not done any tuning on the engine yet so this is most likely only the lower bound.What's the catch?
STANDARD
. (URL_SAFE
is also possible but I haven't yet added support on the decoding side).unsafe
. Because it also has to offset pointers into potentially unallocated space the code is violentlyunsafe
.Since this crate is marked with
#[forbid(unsafe_code)]
by default and I found it not justifiable to add new unsafe code to a fundamental crate like this all unsafe code is explicit opt-in. The engine is only enabled with the featureavx2
, which in turn requires the featureunsafe
(which allows unsafe code).So, what now?
Right now the code is not in a stage where I'd consider it mergeable, there's a number of TODO's left in the code, at least one FIXME and I'd like to prove the soundness of some parts first. It does however work and passes all of the tests that
DEFAULT_ENGINE
passes.But, I'd like to get the discussion with you started on this PR sooner rather than later, especially given that it adds unsafe code and plays with pointers. I would like to see this crate implement intrinsics-accelerated and more generally SIMD enabled Engines but I can also understand why'd you would like to not have any unsafe code at all.
Featurewise there's two more straightforward additions I'm planning for the PR should we decide on adding this engine into this repo at all:
In general it should also be possible to get a similar kind of improvement for other platforms, using e.g. ARM Neon, IBM VSX or the WebAssembly SIMD extension, but I haven't looked into any of those yet so I can't say for certain how much transfers.
Okay, one last question: … But why?
I couldn't sleep Sunday to Monday and writing code that makes CPUs go very fast, hot, and bothered is apparently my way of dealing with that :)