Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement run-time feature detection for arm #120

Closed
gnzlbg opened this issue Oct 14, 2017 · 12 comments
Closed

Implement run-time feature detection for arm #120

gnzlbg opened this issue Oct 14, 2017 · 12 comments

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 14, 2017

Topic sums it up.

We need to differentiate between querying run-time features in privileged and unprivileged modes.

In privileged mode we should do something similar to what we already do on x86.

In unprivileged mode what everybody else seem to be doing is querying /proc/cpuinfo in Linux, and similar in Windows, MacOS, Android, and iOS.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 18, 2017

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 26, 2017

@alexcrichton I need your help with this. I have a prototype of this running, but qemu-user doesn't emulate /proc/cpuinfo so I think we need to switch to qemu-kvm.

By doesn't emulate I mean that the run-time feature detection code reads /proc/cpuinfo of the host, which has an Intel CPU and nothing works (no run-time feature detection, no simd_test, etc.).

Doing this for the 3 arm builds would be enough for now.

Ideally we would add an arm android build as well since on android we can do something better than reading /proc/cpuinfo. There is also /proc/self/auxv which we would need to test as well.

If we are going to have at least 3-5 build bots using qemu-kvm, we might just as well enable it for the x86 builds as well.

Thoughts? ARM run-time feature detection is blocked on this being solved (otherwise committing it would break the build).

@jrmuizel
Copy link
Contributor

You can also get the arm cpu detection information from /proc/self/auxv https://cgit.freedesktop.org/pixman/tree/pixman/pixman-arm.c?h=0.34#n143

Firefox uses /proc/cpuinfo on Linux and Android http://searchfox.org/mozilla-central/rev/40e8eb46609dcb8780764774ec550afff1eed3a5/mozglue/build/arm.cpp#36

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 26, 2017

@jrmuizel on Linux-like systems the plan is to use getauxval, falling back first to /proc/self/auxv and if that also fails to /proc/cpuinfo.

Once that is done, we should think about ring-0 for those running on bare metal, ios, and windows, but we are going to need ways to test those.

Thanks for the firefox links! I'll check them out!

@parched
Copy link

parched commented Oct 26, 2017

@gnzlbg when I was planning to do this here I was going to use the auxv crate. Just using getauxval is indeed the nicest solution, but not all libc implement it.

@jrmuizel
Copy link
Contributor

Pixman uses structured exceptions on Windows. Rust doesn't support them (rust-lang/rust#38963) so detection there would require a .c file or perhaps there's some other method that works.

    __try
    {
	pixman_msvc_try_arm_simd_op ();
	features |= ARM_V6;
    }
    __except (GetExceptionCode () == EXCEPTION_ILLEGAL_INSTRUCTION)
    {
    }

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 26, 2017

@parched chrome includes getauxval using weak linkage and tests if the function pointer is NULL to detect it. I think we can do the same thing here so that if a user is linking against a libc that supports it they automatically benefit from this.

@jrmuizel thanks again for that! on third-party crates having a .c file is fine. This is the code pixman uses to trigger the exceptions:

    area pixman_msvc, code, readonly

    export  pixman_msvc_try_arm_simd_op

pixman_msvc_try_arm_simd_op
    ;; I don't think the msvc arm asm knows how to do SIMD insns
    ;; uqadd8 r3,r3,r3
    dcd 0xe6633f93
    mov pc,lr
    endp

    export  pixman_msvc_try_arm_neon_op

pixman_msvc_try_arm_neon_op
    ;; I don't think the msvc arm asm knows how to do NEON insns
    ;; veor d0,d0,d0
    dcd 0xf3000110
    mov pc,lr
    endp

    end

I think we can do something like that.

FWIW this is the current implementation: https://github.com/gnzlbg/stdsimd/tree/arm_rt

It is very rough and incomplete, but the cpuinfo parts do work on x86 at least. Any help is welcomed :)

@alexcrichton
Copy link
Member

@gnzlbg I think the only "answer" here would be to run full emulation in QEMU instead of just user-mode emulation. The liblibc repo has a few examples of this but unfortunately it's not an easy thing to do. Something like rust-lang/libc#820 may help to have a script for full emulation?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 27, 2017

@alexcrichton thanks for the link! that really helps! I'll see if I can get it done tomorrow.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 16, 2017

Status update, the first round of ARM support is in #175 .

The next steps after that are:

Once all of that is done we will reach the first milestone: minimal, sub-optimal, but tested, run-time feature detection on all ARM targets that are currently in ci.

At that point I'd like to split run-time feature detection into two components, a core crate, and a std crate, as explained in #188 . The current x86 code will remain in the core crate, but the current ARM code will be moved to the std crate because it relies on libc, being able to read files, etc.

The rest of the work can more or less happen concurrently (and some of it can already happen so if you are interested in helping with it just give me a shout).


To bring ARM run-time detection to a good state on non-Windows platforms that use std we will need to:

  • add ci build-bots for android
  • add ci build-bots for ios
  • prefer /proc/auxv to /proc/cpuinfo when available (thanks @lu-zero !)
  • prefer libc's getauxval to /proc/auxv and /proc/cpuinfo when available. This is a bit tricky because getauxval is not exposed by the libc crate, so we somehow need to detect whether the current binary we are on has it linked, and if so, then use it. This is the best way to do run-time feature detection on these Arm platforms.

To bring ARM run-time detection to a good state on core we will need to:

  • add ci build-bots for this (cross has some)
  • implement reading features from the cpuid registers (e.g. see here).

To bring ARM run-time detection to a good state on Windows I am not sure yet.

@jrmuizel suggested that we use structured exceptions to "poke" the CPU, that is, we try to execute an instruction and if that does not SIGILL, then the CPU supports it.

While that might work in practice, it makes me a bit uncomfortable. First, it is undefined behavior. Second, there are some ARM CPUs out there with broken support for NEON that we won't be able to easily detect with this approach.

So we can start by doing it like that, but ideally we would have at least a better solution in sight.


All of this is a lot of work, but I think it is important to get the skeleton right. As we have seen with x86, once everything is properly set up, adding new intrinsics is more or less painless.

But maybe we have set up the bar too hight? I don't know of any single C or C++ compiler that:

  • tests the assembly generated by all of their intrinsics
  • tests that the intrinsics produces the correct results at run-time
  • tests run-time feature detection for all their intrinsics, that is, if an intrinsic is detected as used, test that it can in fact be used and produces correct results

For example, clang just calls the LLVM builtins and checks the LLVM IR at most, LLVM checks the assembly of these built ins in some cases, but AFAIK LLVM doesn't execute this assembly at run-time. Also, none of this is integrated with run-time feature detection.

@alexcrichton
Copy link
Member

This all sounds great to me, thanks for writing all of it down @gnzlbg!

I think it's ok to relax some of the requirements here as you mentioned, if we don't test literally everything then we're probably ok for now. That being said I'm always a fan of getting this all hooked up!

@alexcrichton
Copy link
Member

I believe this is done now with cpuinfo and auxval parsing, so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants