Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which parts of std can stdsimd reuse? #188

Closed
gnzlbg opened this issue Nov 10, 2017 · 12 comments
Closed

Which parts of std can stdsimd reuse? #188

gnzlbg opened this issue Nov 10, 2017 · 12 comments

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 10, 2017

For ARM run-time detection support on Linux one needs:

  • std::fs::open to read /proc/cpuinfo and /proc/auxv
  • libc's getauxval (currently not provided by the libc crate)

On other platforms (windows, android, ...) run-time feature detection for arm is going to need similar functionality as well.

That is, while we can do run-time feature detection for x86 in core::, doing so for ARM is not possible in general (it can be done if you are in ring 0, but that's it).

@parched
Copy link

parched commented Nov 10, 2017

If stdsimd is going into core, the runtime feature detection will have to be split into a different crate I guess, because, as you mention, runtime feature detection in general depends on the OS. It probably should be separate anyway, it's a very useful thing in it's own right. IMO people should be encouraged to write portable code using implicit SIMD with runtime feature detection and only fall back to explicit SIMD when the compiler is struggling to vectorize.

@BurntSushi
Copy link
Member

IMO people should be encouraged to write portable code using implicit SIMD with runtime feature detection and only fall back to explicit SIMD when the compiler is struggling to vectorize.

This is a hard opinion to hold. :-) For example, literally every use case of SIMD (packed substring searching) I'm interested in requires explicit vectorization given the current state of compiler technology.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 10, 2017 via email

@alexcrichton
Copy link
Member

My plan for integration was for the runtime detection to be part of std and all the intrinsics to be part of core. In that sense I think it can use all of std inside of runtime detection.

@AdamNiederer
Copy link
Contributor

AdamNiederer commented Nov 13, 2017

IMO people should be encouraged to write portable code using implicit SIMD with runtime feature detection and only fall back to explicit SIMD when the compiler is struggling to vectorize.

I've found that rustc, even with all of the optimization switches and knobs turned up to 11, has real problems vectorizing anything but a[i] = b[i] + c[i] and memcpy. Even something as simple as [3f32;128].iter().map(|e| e.sqrt().abs() - 2).collect::<_>() uses sqrtss and subss if you add too many operations into the body.

I think explicit SIMD just needs a little bit of static sugar atop it to be both portable and understandable; I'm making that happen at the moment with faster.

@parched
Copy link

parched commented Nov 13, 2017

For example, literally every use case of SIMD (packed substring searching) I'm interested in requires explicit vectorization given the current state of compiler technology.

I've found that rustc, even with all of the optimization switches and knobs turned up to 11, has real problems vectorizing anything but a[i] = b[i] + c[i] and memcpy.

Sadly, you guys aren't wrong :/. Hopefully in the future it will be better. But, yes, a portable SIMD, that abstracts away the vector width and allows it to be a runtime constant, will be the best and future proof. Faster looks very promising, nice work Adam.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 13, 2017

@alexcrichton I think it would be better to split the run-time detection into two parts, a core part, and a std part, more or less like this:

  • core::simd: vector types,
  • core::vendor: vendor intrinsics,
  • core::vendor::detection (strawman name): run-time detection for core users
  • std::vendor::detection (strawman name): run-time detection for std users. For x86 it just re-exports core_detection, but for ARM it does something platform specific: different implementations on MacOS, Linux, Windows, iOS, Android, ...
  • std::vendor::core_detection: re-exports core::vendor::detection as is.

The main argument is that one can always do run-time feature detection without std, that is, when one is full in control of the CPU.

The moment one has std, and a platform, one is typically not in control anymore (e.g. user space Linux applications run in Ring 3).

For x86 this doesn't really matter because cpuid works in ring 3 just fine, so we can use the exact same code for core::vendor::core_detection and std::vendor::detection and everything works out fine.

For arm however, core::vendor::core_detection only works on ring 0, and to obtain this information on ring 3 one needs to ask the underlying platform about it.

The ARM run-time feature detection currently only works with std because that is what I have implemented (I thought this would have the biggest impact due to Android and iOS), but I always envisioned adding support for ring 0 detection at some point.

So maybe we should split runtime feature detection into two crates within stdsimd, a core crate, and a std crate. That way we can make sure that stdsimd remains #[no_std] (we only need feature detection for testing), and also make sure that the core component of run-time feature detection also remains #[no_std].


@AdamNiederer maybe you should open a different issue here to discuss faster, I don't want to divert this discussion with it much.

@AdamNiederer
Copy link
Contributor

@AdamNiederer maybe you should open a different issue here to discuss faster, I don't want to divert this discussion with it much.

Sorry, didn't mean to steal the discussion. I don't know whether this issue tracker is a good place for discussing a wrapper library, anyway. :)

Is there any reason we'd want both core_detection and detection to be public in std? It sounds like we could keep std::core_detection private, unless there are plans to offer less functionality in std::detection than std::core_detection on e.g. ARM.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 13, 2017

Is there any reason we'd want both core_detection and detection to be public in std?

No reason, I just didn't thought about privacy here. Whether there is a platform / CPU combo where it might make sense to expose both... I really hope not.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 17, 2017

So I've just rebased the ARM run-time detection PR on the last changes, and we currently have a temporary solution that isn't too bad.

The library now has a std feature that we can use to conditionally enable ARM run-time support and the PR now does just that.

I wonder if we could use that to avoid having to make any kind of split.

@alexcrichton
Copy link
Member

@gnzlbg the split you mentioned above sounds great to me!

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 22, 2017

This is closed by the coresimd and stdsimd split.

@gnzlbg gnzlbg closed this as completed Nov 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants