Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Detect VFP Size at compile time #27890

Closed

Conversation

franksinankaya
Copy link

@franksinankaya franksinankaya commented Nov 18, 2019

Based on the table here: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0774g/chr1383660321827.html

This change enables armv7 targets with d16 VFP support.

@franksinankaya
Copy link
Author

@BruceForstall
Copy link
Member

Seems like a small change, but I presume you haven't checked that this is all that is necessary to enable vfp3-d16.

Is __ARM_FP also set equivalently for clang?

I wonder if we shouldn't depend on these defines in the source code, and should define our own in the configurecompiler.cmake file, for instance (maybe isolating a check for __ARM_FP in one place).

@franksinankaya
Copy link
Author

No, I have not tested the build or clang.
I just did a compile test with GCC only.
The preprocessor definition is coming from ARM. I'm hoping that clang honors that too.
Looking forward to test results from the issue from people with real-hardware in hand.

We can certainly make it better if it produces successful test result.
From what I read in your email, having VFP3-D16 is enough for JIT.
Hopefully, there is no other reason why we should be saving and restoring registers between D16-D32.

Fix preprocessor

d16

Apply increment effect

asm
@franksinankaya
Copy link
Author

Let's see if we can start the conversation here. Got some hello world success in the attached issue. Waiting for more results.

What is the best way to integrate this change?
I know I'll move this to runtime repository but looking for opinions.

@jkotas @janvorli @BruceForstall

@jkotas
Copy link
Member

jkotas commented Nov 26, 2019

How common are the armv7d16 devices? What are they usually used for?

@franksinankaya
Copy link
Author

#17043 has some device mentions.

@franksinankaya
Copy link
Author

According to what I read, VFP style is optional in ARMv7. You might have a very powerful CPU but only 16 VFP registers for instance.

You might even have neon with 16 VFP registers.

There are some classes of devices that are being left out because of the compiler choices made.
Even JIT doesn't require 32 VFP registers.

@franksinankaya
Copy link
Author

Maybe going to vfp-d16 by default is the answer.

@franksinankaya
Copy link
Author

It has been confirmed that vfp3-d16 build is alive.

I think we should do one of these:

  1. Allow people to do a one-off builld and let them choose whether it is good enough or not. We can’t predict goodness criteria, let alone the performance of an application based on vfp type
  2. Set the default to vfp16. This might become a good performance bump since we save 16 less registers on context switch. Need more data here.

We can do # 1 short term and put # 2 under investigation by a performance team for feasibility analysis.

Any other opinions?

@jkotas @janvorli @BruceForstall

@jkotas
Copy link
Member

jkotas commented Dec 2, 2019

I would be fine with having support for one-off vfp16 build, but I do not think it is worth it to promote it to a complete new BuildArch.

@franksinankaya
Copy link
Author

Okay. What’s a good balance?

Rely on some environment variable to determine vfp type?

@maryamariyan
Copy link
Member

Thank you for your contribution. As announced in #27549 the dotnet/runtime repository will be used going forward for changes to this code base. Closing this PR as no more changes will be accepted into master for this repository. If you’d like to continue working on this change please move it to dotnet/runtime.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants