-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strange performance regression: lookup vs rsh #48278
Comments
And actually, there's one more thing we've been scratching heads over. If we make the switch a bit more complex, adding a method which won't be used in the benchmark: const (
set2BitsMask = uint16(0b1100_0000_0000_0000)
set3BitsMask = uint16(0b1110_0000_0000_0000)
set4BitsMask = uint16(0b1111_0000_0000_0000)
set5BitsMask = uint16(0b1111_1000_0000_0000)
set6BitsMask = uint16(0b1111_1100_0000_0000)
set7BitsMask = uint16(0b1111_1110_0000_0000)
)
func (bits bitvec) setN(flag uint16, pos uint64) {
a := flag >> (pos % 8)
bits[pos/8] |= byte(a >> 8)
if b := byte(a); b != 0 {
// If the bit-setting affects the neighbouring byte, we can assign - no need to OR it,
// since it's the first write to that byte
bits[pos/8+1] = b
}
} But which adds more cases to the switch: func codeBitmapInternal(code []byte, bits bitvec) bitvec {
for pc := uint64(0); pc < uint64(len(code)); {
op := code[pc]
pc++
if op < 0x60 || op > 0x7f {
continue
}
numbits := op - 0x60 + 1
switch numbits {
case 1:
bits.set1(pc)
pc += 1
case 2:
bits.setN(set2BitsMask, pc)
pc += 2
case 3:
bits.setN(set3BitsMask, pc)
pc += 3
case 4:
bits.setN(set4BitsMask, pc)
pc += 4
case 5:
bits.setN(set5BitsMask, pc)
pc += 5
case 6:
bits.setN(set6BitsMask, pc)
pc += 6
case 7:
bits.setN(set7BitsMask, pc)
pc += 7
}
}
return bits
} Then, suddenly, the usage of lookup versus right-shift starts to affect the
|
I don't see anything obvious from the generated assembly.
Possibly that latter |
Ok, I made the change by just replacing the types of the lookup and the bitvec but otherwise not changing anything about the code.
|
I'm out of ideas. |
Gotcha, thanks for looking into it!
|
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
In go-ethereum, we have a an algorithm that iterates over a slice of bytes, and fills a bitmap depending on the values of those bytes. Certain values set a
1
, others leave it at0
.While optimizing it, we found a behaviour that was very unintuitive, and we could not explain it. By using a small lookup table instead of a right-shift, performance increased by ~20%.
This PR attempts to remove the lookup, and notes the performance regression: ethereum/go-ethereum#23472 .
I've made a PoC repro.
analysis.go
analysis_test.go:
And a script to perform the benchmark, then replace the shift with a
lookup
:test.sh:
Running this yields:
In other words, the code using:
Is a lot faster than the code using right shift:
What did you expect to see?
I expected the right shift operation to be at least as fast as the lookup-based variant.
What did you see instead?
The lookup being faster.
So, this is maybe not a bug, but it's behaviour that I would be grateful to figure out the underlying reason for. I guess also there is a slight chance that the compiler somehow misses an opportunity to optimize something here.
I have studied the
gcflags="-m -m "
output to see if the inlining output showed anything: but afaict the inlining behaviour is identical across both variants.The text was updated successfully, but these errors were encountered: