Which parts of std can stdsimd reuse? #188

gnzlbg · 2017-11-10T12:21:06Z

For ARM run-time detection support on Linux one needs:

std::fs::open to read /proc/cpuinfo and /proc/auxv
libc's getauxval (currently not provided by the libc crate)

On other platforms (windows, android, ...) run-time feature detection for arm is going to need similar functionality as well.

That is, while we can do run-time feature detection for x86 in core::, doing so for ARM is not possible in general (it can be done if you are in ring 0, but that's it).

The text was updated successfully, but these errors were encountered:

parched · 2017-11-10T18:39:07Z

If stdsimd is going into core, the runtime feature detection will have to be split into a different crate I guess, because, as you mention, runtime feature detection in general depends on the OS. It probably should be separate anyway, it's a very useful thing in it's own right. IMO people should be encouraged to write portable code using implicit SIMD with runtime feature detection and only fall back to explicit SIMD when the compiler is struggling to vectorize.

BurntSushi · 2017-11-10T18:56:36Z

IMO people should be encouraged to write portable code using implicit SIMD with runtime feature detection and only fall back to explicit SIMD when the compiler is struggling to vectorize.

This is a hard opinion to hold. :-) For example, literally every use case of SIMD (packed substring searching) I'm interested in requires explicit vectorization given the current state of compiler technology.

gnzlbg · 2017-11-10T22:43:45Z

I don't know what the right approach is yet. I think that everything that requires std should be part of another library component (e.g arm rt detection using /process/cpuinfo) but everything that can be done in core should be: e.g arm runtime detection on ring 0. I think that for the time being is ok to have everything here since our test harness is tightly coupled with rt feature detection, but we should start discussing about what makes sense to have where.

…

On Fri 10. Nov 2017 at 19:56, Andrew Gallant ***@***.***> wrote: IMO people should be encouraged to write portable code using implicit SIMD with runtime feature detection and only fall back to explicit SIMD when the compiler is struggling to vectorize. This is a hard opinion to hold. :-) For example, literally every use case of SIMD (packed substring searching) I'm interested in requires explicit vectorization given the current state of compiler technology. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#188 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3NpmgkhbBkor-TeyYK7JQgMQj59QGoks5s1JxlgaJpZM4QZe2X> .

alexcrichton · 2017-11-11T22:36:09Z

My plan for integration was for the runtime detection to be part of std and all the intrinsics to be part of core. In that sense I think it can use all of std inside of runtime detection.

AdamNiederer · 2017-11-13T01:30:55Z

IMO people should be encouraged to write portable code using implicit SIMD with runtime feature detection and only fall back to explicit SIMD when the compiler is struggling to vectorize.

I've found that rustc, even with all of the optimization switches and knobs turned up to 11, has real problems vectorizing anything but a[i] = b[i] + c[i] and memcpy. Even something as simple as [3f32;128].iter().map(|e| e.sqrt().abs() - 2).collect::<_>() uses sqrtss and subss if you add too many operations into the body.

I think explicit SIMD just needs a little bit of static sugar atop it to be both portable and understandable; I'm making that happen at the moment with faster.

parched · 2017-11-13T08:29:05Z

For example, literally every use case of SIMD (packed substring searching) I'm interested in requires explicit vectorization given the current state of compiler technology.

I've found that rustc, even with all of the optimization switches and knobs turned up to 11, has real problems vectorizing anything but a[i] = b[i] + c[i] and memcpy.

Sadly, you guys aren't wrong :/. Hopefully in the future it will be better. But, yes, a portable SIMD, that abstracts away the vector width and allows it to be a runtime constant, will be the best and future proof. Faster looks very promising, nice work Adam.

gnzlbg · 2017-11-13T09:13:37Z

@alexcrichton I think it would be better to split the run-time detection into two parts, a core part, and a std part, more or less like this:

core::simd: vector types,
core::vendor: vendor intrinsics,
core::vendor::detection (strawman name): run-time detection for core users
std::vendor::detection (strawman name): run-time detection for std users. For x86 it just re-exports core_detection, but for ARM it does something platform specific: different implementations on MacOS, Linux, Windows, iOS, Android, ...
std::vendor::core_detection: re-exports core::vendor::detection as is.

The main argument is that one can always do run-time feature detection without std, that is, when one is full in control of the CPU.

The moment one has std, and a platform, one is typically not in control anymore (e.g. user space Linux applications run in Ring 3).

For x86 this doesn't really matter because cpuid works in ring 3 just fine, so we can use the exact same code for core::vendor::core_detection and std::vendor::detection and everything works out fine.

For arm however, core::vendor::core_detection only works on ring 0, and to obtain this information on ring 3 one needs to ask the underlying platform about it.

The ARM run-time feature detection currently only works with std because that is what I have implemented (I thought this would have the biggest impact due to Android and iOS), but I always envisioned adding support for ring 0 detection at some point.

So maybe we should split runtime feature detection into two crates within stdsimd, a core crate, and a std crate. That way we can make sure that stdsimd remains #[no_std] (we only need feature detection for testing), and also make sure that the core component of run-time feature detection also remains #[no_std].

@AdamNiederer maybe you should open a different issue here to discuss faster, I don't want to divert this discussion with it much.

AdamNiederer · 2017-11-13T13:20:58Z

@AdamNiederer maybe you should open a different issue here to discuss faster, I don't want to divert this discussion with it much.

Sorry, didn't mean to steal the discussion. I don't know whether this issue tracker is a good place for discussing a wrapper library, anyway. :)

Is there any reason we'd want both core_detection and detection to be public in std? It sounds like we could keep std::core_detection private, unless there are plans to offer less functionality in std::detection than std::core_detection on e.g. ARM.

gnzlbg · 2017-11-13T13:32:27Z

Is there any reason we'd want both core_detection and detection to be public in std?

No reason, I just didn't thought about privacy here. Whether there is a platform / CPU combo where it might make sense to expose both... I really hope not.

gnzlbg · 2017-11-17T16:25:05Z

So I've just rebased the ARM run-time detection PR on the last changes, and we currently have a temporary solution that isn't too bad.

The library now has a std feature that we can use to conditionally enable ARM run-time support and the PR now does just that.

I wonder if we could use that to avoid having to make any kind of split.

alexcrichton · 2017-11-19T16:34:52Z

@gnzlbg the split you mentioned above sounds great to me!

gnzlbg · 2017-11-22T13:03:46Z

This is closed by the coresimd and stdsimd split.

gnzlbg mentioned this issue Nov 16, 2017

Implement run-time feature detection for arm #120

Closed

gnzlbg closed this as completed Nov 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which parts of std can stdsimd reuse? #188

Which parts of std can stdsimd reuse? #188

gnzlbg commented Nov 10, 2017 •

edited

Loading

parched commented Nov 10, 2017

BurntSushi commented Nov 10, 2017

gnzlbg commented Nov 10, 2017 via email

alexcrichton commented Nov 11, 2017

AdamNiederer commented Nov 13, 2017 •

edited

Loading

parched commented Nov 13, 2017

gnzlbg commented Nov 13, 2017 •

edited

Loading

AdamNiederer commented Nov 13, 2017

gnzlbg commented Nov 13, 2017

gnzlbg commented Nov 17, 2017

alexcrichton commented Nov 19, 2017

gnzlbg commented Nov 22, 2017

Which parts of std can stdsimd reuse? #188

Which parts of std can stdsimd reuse? #188

Comments

gnzlbg commented Nov 10, 2017 • edited Loading

parched commented Nov 10, 2017

BurntSushi commented Nov 10, 2017

gnzlbg commented Nov 10, 2017 via email

alexcrichton commented Nov 11, 2017

AdamNiederer commented Nov 13, 2017 • edited Loading

parched commented Nov 13, 2017

gnzlbg commented Nov 13, 2017 • edited Loading

AdamNiederer commented Nov 13, 2017

gnzlbg commented Nov 13, 2017

gnzlbg commented Nov 17, 2017

alexcrichton commented Nov 19, 2017

gnzlbg commented Nov 22, 2017

gnzlbg commented Nov 10, 2017 •

edited

Loading

AdamNiederer commented Nov 13, 2017 •

edited

Loading

gnzlbg commented Nov 13, 2017 •

edited

Loading