Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test for checking FMA_NATIVE is faulty. #33011

Closed
KristofferC opened this issue Aug 21, 2019 · 10 comments
Closed

The test for checking FMA_NATIVE is faulty. #33011

KristofferC opened this issue Aug 21, 2019 · 10 comments
Labels
compiler:codegen Generation of LLVM IR and native code maths Mathematical functions performance Must go faster

Comments

@KristofferC
Copy link
Member

We currently check for support for FMA (fused multiply add) by doing

const FMA_NATIVE = muladd(nextfloat(1.0),nextfloat(1.0),-nextfloat(1.0,2)) != 0

From #33010 (comment) @yuyichao said:

This test is wrong. Llvm has all the freedom it want to return either values for muladd no matter if fma is available. Fwiw, arm / aarch 64 even have fast muladd instruction with intermediate rounding.

@KristofferC KristofferC added the performance Must go faster label Aug 21, 2019
@mbauman
Copy link
Member

mbauman commented Aug 21, 2019

More details and a breadcrumb for exposing LLVM support here: #9855.

@yuyichao
Copy link
Contributor

yuyichao commented Aug 21, 2019

We don't really need LLVM support since we have all the information in src/processor* now.

It's hard to expose a user-facing generic, flexible and stable API but for now exposing a temperary internal API should do no worse than now (Also note that the check cannot be fixed as is since it's at most reflecting the compiling machine status)

We could simply add a function to process_*.cpp jl_is_fma_native which will be implemented as jl_test_cpu_feature(JL_X86_fma) || jl_test_cpu_feature(JL_X86_fma4) for x86, jl_test_cpu_feature(JL_AArch32_vfp4) for arm, 1 for aarch64 and 0 for fallback. Note that this is the native feature of the running machine and not taking into account any target specification.

@yuyichao
Copy link
Contributor

FWIW, given the target machine people are likely going to run julia on, I feel like setting it to true for x64 should be fine......................... Assuming the user of it doesn't directly emit fma instruction of course....

@JeffBezanson JeffBezanson added compiler:codegen Generation of LLVM IR and native code maths Mathematical functions labels Nov 5, 2019
@chriselrod
Copy link
Contributor

The Travis Mac doesn't have fma.

Despite compiling locally on a machine where it should be true, it's somehow set to false (although running the definition in the REPL returns true) . I guess that's because

Llvm has all the freedom it want to return either values for muladd no matter if fma is available.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Oct 5, 2020

What if we changed
const FMA_NATIVE = muladd(nextfloat(1.0),nextfloat(1.0),-nextfloat(1.0,2)) != 0
to
const FMA_NATIVE = fma(nextfloat(1.0),nextfloat(1.0),-nextfloat(1.0,2)) != 0
(at worst) it would be correct much more often [currently, afaik everyone gets the slower version of log(::Float64)]

@oscardssmith
Copy link
Member

Wait, that's our check? That's straight up broken. We absolutely should fix that.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Oct 5, 2020

now with PR #37886

@simonbyrne
Copy link
Contributor

simonbyrne commented Nov 18, 2021

To follow up on @yuyichao's comment, we can now do:

import Base.BinaryPlatforms.CPUID

function has_fma()
    CPUID.test_cpu_feature(CPUID.JL_X86_fma) ||
    CPUID.test_cpu_feature(CPUID.JL_X86_fma4) ||
    CPUID.test_cpu_feature(CPUID.JL_AArch32_vfp4) ||
    CPUID.normalize_arch(String(Sys.ARCH)) == "aarch64"
end

This has two problems:

  1. It doesn't match the cpu target used by Julia i.e. starting julia -C "nehalem" still gives has_fma() == true on my (skylake) machine.
  2. It won't be constant propagated, so would need to be used with @static, i.e.:
if @static has_fma()
   ...
end

In other words, we wouldn't be able to use this for the standard library, as it would simply be reflecting the architecture of the buildbot.

@simonbyrne
Copy link
Contributor

Is there a way we can query the cpu_target from within Julia?

@oscardssmith
Copy link
Member

Theoretically #43085 should take care of this for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code maths Mathematical functions performance Must go faster
Projects
None yet
Development

No branches or pull requests

9 participants