math: FMA is slower than non-FMA calculation

### What version of Go are you using (`go version`)?

<pre>
$ go version
go version devel +0377f06168 Tue Dec 17 20:57:06 2019 +0000 windows/amd64
</pre>

### Does this issue reproduce with the latest release?

Yes

### What operating system and processor architecture are you using (`go env`)?

<details><summary><code>go env</code> Output</summary><br><pre>
$ go env
set GO111MODULE=auto
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\mattn\AppData\Local\go-build
set GOENV=C:\Users\mattn\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\mattn\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=C:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\mattn\AppData\Local\Temp\go-build216348474=/tmp/go-build -gno-record-gcc-switches
</pre></details>

### What did you do?

```go
package main

import (
	"fmt"
	"math"
	"math/rand"
	"time"
)

var a, b, c, d float64

func pure_fma_func() {
	d = math.FMA(a, b, c)
}

func non_fma_func() {
	d = a*b + c
}

func main() {
	const n = 1000000000

	a = rand.Float64()
	b = rand.Float64()
	c = rand.Float64()

	t1 := time.Now()
	for i := int64(0); i < n; i++ {
		non_fma_func()
	}
	t2 := time.Now()
	for i := int64(0); i < n; i++ {
		pure_fma_func()
	}
	t3 := time.Now()

	fmt.Println("non FMA", t2.Sub(t1))
	fmt.Println("    FMA", t3.Sub(t2))
}
```

And `go run`.

### What did you expect to see?

`math.FMA` is faster than non-FMA code.

### What did you see instead?

```
non FMA 548.0314ms
    FMA 924.0528ms
```
I confirmed my CPU have simd-FMA. Is this an overhead of function call?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

math: FMA is slower than non-FMA calculation #36196

What version of Go are you using (`go version`)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (`go env`)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

math: FMA is slower than non-FMA calculation #36196

Description

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?