cmd/compile: OR into memory is cheaper than MOV/BTSL/MOV on x86

### What version of Go are you using (`go version`)?

<pre>
$ go version
go1.21-dev +fe5af1532a
</pre>

### Does this issue reproduce with the latest release?

Yes.

### What operating system and processor architecture are you using (`go env`)?

```
GOARCH=amd64
```

### What did you do?

We have a code generator that generates a struct with setters. To track whether set has been called for a given field, we flip the bit in a bitmap. The code looks like this:

```go
func setBit(part *uint32, num uint32) {
	*part |= 1 << (num % 32)
}

type x struct {
	bitmap [4]uint32 // A bitmap containing whether "Set" was called on a given field.
	u      int32     // Imagine this is field number 8.
	v      int32     // Imagine this is field number 38.

}

func (m *x) SetV(val int32) {
	m.v = val
	setBit(&(m.bitmap[1]), 37)
}

func (m *x) SetU(val int32) {
	m.u = val
	setBit(&(m.bitmap[0]), 7)
}
```

### What did you expect to see?

I expected similar instructions (with different operands) being generated for both setters.

### What did you see instead?

`SetU` is ~30% slower than `SetV`, as measured in local benchmarks (on a zen4 machine). The relevant difference is ([godbolt](https://godbolt.org/z/vE4GesEdP)):

```asm
TEXT    main.(*x).SetV(SB), NOSPLIT|NOFRAME|ABIInternal, $0-16
        MOVL    BX, 20(AX)
        NOP
        ORL     $32, 4(AX)
        RET

TEXT    main.(*x).SetU(SB), NOSPLIT|NOFRAME|ABIInternal, $0-16
        MOVL    BX, 16(AX)
        MOVL    (AX), CX
        BTSL    $7, CX
        NOP
        MOVL    CX, (AX)
        RET
```

It seems like `OR` into memory does better than `MOV/BTS/MOV`.

According to https://www.uops.info/table.html, for skylake-x and zen4, it seems the OR family is pound-for-pound (slightly) better than the BTS family:

Instruction   | Lat      | TP          | Uops  | Ports                     | Lat     | TP   | Uops | Ports
------------- | -------- | ----------- | ----- | ------------------------- | ------- | ---- | ---- | -----
BTS (M32, I8) | [≤3;≤10] | 1.00 / 1.00 | 3 / 4 | 1*p06+1*p23+1*p237+1*p4   | [5;12]  | 2.00 | 4    |
OR (M32, I32) | [≤3;≤10] | 1.00 / 1.00 | 2 / 4 | 1*p0156+1*p23+1*p237+1*p4 | [≤1;≤8] | 0.56 | 2    |
BTS (R32, I8) | 1        | 0.50 / 0.50 | 1 / 1 | 1*p06                     | [1;2]   | 1.00 | 2    |
OR (R32, I8)  | 1        | 0.25 / 0.25 | 1 / 1 | 1*p0156                   | 1       | 0.25 | 1    |

I didn't look up what those `MOV` instructions cost, but it's difficult to predict costs from individual operations in the complex processors of today. Things I didn't test (because the Go compiler doesn't generate/inline them:

- MOV/OR/MOV
- BTS memory,immediate

Some of the speedup may be due to the shorter instruction sequence, too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmd/compile: OR into memory is cheaper than MOV/BTSL/MOV on x86 #61694

What version of Go are you using (`go version`)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (`go env`)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Instruction	Lat	TP	Uops	Ports	Lat	TP	Uops
BTS (M32, I8)	[≤3;≤10]	1.00 / 1.00	3 / 4	1p06+1p23+1p237+1p4	[5;12]	2.00	4
OR (M32, I32)	[≤3;≤10]	1.00 / 1.00	2 / 4	1p0156+1p23+1p237+1p4	[≤1;≤8]	0.56	2
BTS (R32, I8)	1	0.50 / 0.50	1 / 1	1*p06	[1;2]	1.00	2
OR (R32, I8)	1	0.25 / 0.25	1 / 1	1*p0156	1	0.25	1

cmd/compile: OR into memory is cheaper than MOV/BTSL/MOV on x86 #61694

Description

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?