-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Portable packed SIMD vector types #2948
base: master
Are you sure you want to change the base?
Conversation
I have so many, many thoughts, the first and most important of which is: Not a single SIMD crate has a good and complete version of portable SIMD, and we don't even have full Neon support in Nightly, so this is probably far too early to RFC. |
The key thing that makes the Once this design that conceptually matches what compiler back ends is available in the standard library, more abstractions can be built on top of it in the crate ecosystem.
Nothing ever lands if it's waiting for completeness. As acknowledged by the second paragraph of the RFC, it's OK to add more feature later, but let's get what already exists and works into a form that can be used outside nightly.
This RFC does not need be blocked on the completion or even stabilization of |
So if your position is "here's a junk drawer of stuff that you maybe could use" rather than "here's a complete and ready solution", well that's not wrong I suppose. I think that if you want to expose LLVM's intrinsic concepts for normal Rust users to build with then that's super cool, but we should do that as directly as possible if that is the goal. Currently the Speaking personally, when I tried to understand I'm not at all saying that Rust is fine without portable SIMD. That's important to have in the long term. I am saying that |
I think that's an inappropriate characterization. Rather, my position is that
I want to expose the concept of portable SIMD that LLVM has without exposing LLVM-specific idiosyncrasies but making the concept work well on the Rust level. That's what
I understand the desire for ideal design, but real artists ship, and actually being able to use stuff is an important feature. The application programmer should not look at what
In general, it's not useful to pay attention to
Implementing portable SIMD is a lot of work that has already been done. I'm a user of portable SIMD, so I care, I'm not going have the bandwidth to implement portable SIMD in a different way. I think we should take the work that has been done instead of blocking it by saying "not like that". |
The other big question I'm not seeing a clear answer to in the RFC is: Why is it important to get this into There are a few entries in the FAQ section that touch on this, but they mostly left me confused. For example, this one basically says that
But I don't get how promoting |
From the view of a consumer, who wants to use SIMD in rust, today barriers exist - you choose nightly and packed simd (which from my view, was a great user experience honestly), or you need to use stable with std::arch - reading std::arch's SIMD intrinsics was really daunting and I don't think I'd be able to be effective in using them. I don't think that packed_simd was confusing to use at all, a good hour with the docs and I had working examples that were able to improve real workloads I have. Like anything it's a slider - do you use the "lowest level" and get all the control? Or do you use a "high level" implementation (ie the faster crate) but lose some of the control? Or worse, rely on autovectorisation and have no control? I think that packed simd appears a good middle ground on this scale. It's low enough to expose the operations that exist on hardware, to solve real problems, but without being so low level that I'm required to read an intel architecture manual.
Waiting for Rust to be "perfect" before making a change like this, seems like the wrong answer. This is why tools like macro backtrace, and ASAN remain locked behind nightly, despite being hugely useful tools. If the insistence is that shuffle! needs const generics, then why not move shuffle behind a feature flag for nightly, hide the Simd implementation so that it's completely internal to the library, and then allow something useful to be accessible to consumers. When const generics happens, it can be changed internally in the library, and shuffle exposed to users at that point. From my view, I support this PR and seeing packed_simd become part of std. |
That's good reason to get more SIMD operations stabilized for other architectures, so that the packed_simd crate can run on stable. I could be mistaken, but it seems like with the new inline assembly set to be stabilized, it might not even be necessary to get every SIMD function stabilized: only the basic register types for each architecture? There's still no good motivation presented for why the portable-SIMD API needs to be in |
My motivation is that Firefox has been shipping first with The Rust community reacts with shock every time someone new discovers how this is accomplished by working around Rust's stability story (and, too often, with hostility, too).
The compilation of
As noted, if you are OK with compiling with nightly features enabled,
I guess both reasons apply.
It would resolve it to the same extent as providing
They are on the same level of abstraction as
The optimizer has less visibility into inline asm than into ISA-specific intrinsics. Both are ISA-spefic and, therefore, don't remove the need for being able to write portable code for SIMD like we get to write portable code for scalars.
It needs to be in That is, it needs to be in |
FWIW, the plan was always to put a platform independent SIMD API into std. It cannot live outside of std because its implementation relies on LLVM specific bits. This is not a simple matter of "let's stabilize more intrinsics for other architectures so that people can write things like packed_simd in crates on stable." (I haven't had a chance to read the RFC yet, but wanted to chime in on at least that point now. I hope to comment more later.) |
Not guaranteeing mask type width would be a problem for in terms of zero-cost bitcasts to integer vectors and zero-cost transmute to
@gnzlbg did the hard work. I'm tying to promote the work and added the FAQ. |
I agree that would be the case. But as proposed, on systems that support EVEX encoding your mask types are non-zero-cost for masking, which I would consider a more import consideration. In the current draft, how would the |
avx512 is quite simply not a portable CPU feature set. |
128-bit wide SIMD vector type. How these bits are interpreted depends on the intrinsic | ||
being used. For example, let's sum 8 `f32`s values using the SSE4.1 facilities | ||
in the `std::arch` module. This is one way to do it | ||
([playground](https://play.rust-lang.org/?gist=165e2886b4883ec98d4e8bb4d6a32e22&version=nightly)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link is not runnable. Try a new link: https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=4242a33261a8b0e12894fc39ad140f21
is not portable. | ||
|
||
With portable packed vector types, we can do much better | ||
([playground](https://play.rust-lang.org/?gist=7fb4e3b6c711b5feb35533b50315a5fb&version=nightly)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link is not runnable. Maybe it should be packed_simd::f32x4
instead.
The `std::arch` module exposes architecture-specific SIMD types like `_m128` - a | ||
128-bit wide SIMD vector type. How these bits are interpreted depends on the intrinsic | ||
being used. For example, let's sum 8 `f32`s values using the SSE4.1 facilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect to the point of being misleading.
__m128
(note that it's two leading underscores) as well as the higher bit variants, is explicitly and specifically to be interpreted as f32
lanes.
Similarly, __m128d
(and wider variants) are specifically f64
in each lane.
Only the __m128i
type (and wider variants) holds integer bits where the exact nature of the integer type changes on an operation-by-operation basis.
I wanted to go meta on my previous objections to guaranteed mask layout. I feel like the responses got bogged down in the niche-ness of AVX512. I feel the standard library should be extremely conservative in the guarantees it makes. Another RFC can always add more guarantees at a later date if there proves a real need. The eventual RFC guaranteeing the size of To add a another data point (also niche) back to the original argument. The draft spec for the Risc-V packed SIMD has mask registers that don't follow the convention assumed in this RFC. They are the full width, but only use 1 bit per element. On the implementation side, if this library is a wrapper for LLVM intrinsics, both the To address responses from the RFC submitter:
|
Can you please explain why portable SIMD has to be tied to LLVM intrinsics and not implemented in terms of Also I think this RFC lacks explanation of how portable SIMD will work with runtime detection. This problem is not an easy one, especially if we want to compile several versions of an algorithm with different target features coming from a separate crate. Ideally I would like to see something like this, but even it does not solve the issue with a separate crate. |
The RFC doesn't lack an explanation of runtime detection, it explicitly rejects the idea: https://github.com/hsivonen/rfcs/blob/ppv/text/0000-ppv.md#shouldnt-the-portable-simd-implementation-detect-instruction-set-extensions-at-run-time-and-adapt |
The linked part talks about runtime detection at individual operation level, while my post is about algorithm level. I agree that that detection should not be performed for individual operations, since it can seriously degrade performance, but it's a very common expectations for SIMD accelerated code to be able to switch between different implementations at runtime depending on hardware capabilities. So I believe SIMD proposals should keep such use cases in mind.
|
Operations exposed by this low layer of portable SIMD are each too small to use runtime detection. This RFC is not generally proposing any operation big enough to warrant a feature check and then branch to one variant or the other. Even sin and cos don't really benefit enough from feature variations to offset the cost of the check for a higher feature level. |
This is covered in the FAQ. While there is no theoretical obstacle, there's a huge practical obstacle: Rewriting a large chunk of code that LLVM has (and, I believe, GCC has) and Cranelift either has or is going to have makes no sense as a matter of realistic allocation of development effort and its hard to motivate anyone to do the work given that the work could be avoided by delegating to an existing compiler back end. More bluntly: Those who suggest re-implementing the functionality on top of |
Basic operations like addition and insertion are currently implemented in corearch using the Preferably the portable packed SIMD vector types only expose operations that can be implemented using |
Actually, I have. I just stopped after |
This isn't really a constructive piece of input. @hsivonen's point still stands that an alternative doesn't exist especially if you have become bored with the continued development of your version. What is currently offered by packed simd not only exists today, but is in (probably production) use today as mentioned which goes a long way to proving it's value even if not "perfect". |
Rust is not generally in the business of stabilizing things into the standard library before they're ready. This is what keeps our standard library quite high quality. I stand by this policy. The biggest limit to Stable portable SIMD via explicit intrinsics, in terms of serving x86/x64 and ARM/Aarch64 (the majority of processors in use), is the fact that the ARM intrinsics aren't on Stable, and they aren't even complete on Nightly. They're like 11% complete on Nightly. It's a bit of a drag, but the answer here is to submit PRs to Also, |
I think what @Lokathor says makes a lot of sense:
And to these points, I'd like to add mine:
|
I'm trying to understand what the real blockers to moving forward with this RFC are. There's an implicit motivation of getting How relevant is the question of whether to build on LLVM intrinsics vs From a public API perspective it seems like that only becomes significant when we're suggesting building on The question then is why not this proposed API? There seems to be a general impression in the comments here that the API proposed is not ideal. What would a more ideal one look like? From what I can see, it looks like the RFC isn't proposing the vector types as an alias of some generic |
the current I'd like to note that all of this should hopefully go into |
AFAICT, the main blocker is that this lacks a champion within the libs team. (I don't have the bandwidth to become that person at this time.)
As noted earlier, I believe in terms of the amount of work and who is willing to do the work, the only realistic path forward is to use the LLVM (or in the future other back end) facilities as While building on |
I do not believe that it would tie rust particularly to LLVM, it would just tie that one crate using the LLVM intrinsics to LLVM until an alternate backend also had the same intrinsics. Which is arguably less disruptive than what happens with Incidentally |
It is very likely to have at least one crate in the dependency graph that needs a certain intrinsic, which is actually quite disruptive. For example the
All |
Ah gotcha 👍 I was just going off the APIs proposed in the RFC itself, which doesn’t mention the
I’d welcome a PR that introduced the API proposed in the RFC as unstable with the vector types expressed as opaque structs instead of type aliases to |
I could review such PRs. |
I’ve opened #2977 to establish a project group to work on Thanks everyone for all your effort so far! |
Update on the project groupThe Portable SIMD group (@rust-lang/project-portable-simd) has been established and very busy! We’re mostly working in the new https://github.com/rust-lang/stdsimd repository, scaffolding the We’ve also published a new patched version of [dependencies.packed_simd]
package = "packed_simd_2" |
This RFC extends Rust with portable packed SIMD vector types. The RFC is forked from the closed RFC 2366 by @gnzlbg.
Rendered