-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: optimise ranges over string(byteSlice) #27148
Comments
Another similar optimization: https://go-review.googlesource.com/c/go/+/108985 Perhaps there's a better way to iterate over the utf8 runes in a byte slice that I'm missing. Hopefully not, as that would be a bit embarassing :) |
One issue why this has not been optimized like range []byte(string) yet to not allocate is that it is hard to prove in general that in "for i, c := range string(s)" the s is not changed within the for loop. A single byte pointer changed anywhere within computations (even in other packages or through interfaces) within the for loop can change s and thereby change the result from the case where s is copied on loop entry. However there might be cases like this were it could trigger with some very conservative analysis. Maybe we could solve this problem by having a range loop variant that iterates over the runes of a byte slice (taking some of the changes to byte slice into account) and does not need to allocate in general. Similar to map iteration not iterating over a snapshot of the map on loop entry. The other question is why BenchmarkManual is so much slower even if it does not need to allocate. |
Note also there is a slight difference between the two versions of the code given: |
Conservative analysis sounds fine to me. I imagine that most of the use cases that matter would not involve pointers, reflection, nor any other magic. The range loop variant seems interesting. Are there any downsides to this method? I imagine it would essentially be as if the compiler "expanded" the idiomatic code into the manual code.
Yes - but note that in
That came to mind, but I wasn't sure. How could I have written the idiomatic version to behave equally? At least |
Note that its not only any "magic" that cant be used but no function (even ones inserted by the compiler) can be used that is not known during compile time or function that changes a byte pointer or byte slice as we need to prove those can not alias to the slice we iterating over. This will need per function information #25999. |
Where exactly is the allocation? pprof should tell you. |
And now a little experiment: - var Input = bytes.Repeat([]byte("abcde"), 500)
+ var Input = bytes.Repeat([]byte("abcde"), 5)
The problem is that non-escaping tmp buffer is allocated on stack and So if string being converted does not fit that buffer, it will be heap-allocated: // The constant is known to the compiler.
// There is no fundamental theory behind this number.
const tmpStringBufSize = 32
type tmpBuf [tmpStringBufSize]byte Somewhere inside // buf is tmpBuf
if buf != nil && len(b) <= len(buf) {
p = unsafe.Pointer(buf)
} else {
p = mallocgc(uintptr(len(b)), nil, false)
} I think there was an issue somewhere to create readonly |
Doesn't look like anyone plans on working on this, so I'm removing it from 1.12 for now. I think we should keep the issue open for a while, to see if there's any interest. |
Poking around I can't imagine a way to remove that bounds check, other than having the compiler treat I still think there should be an easy and performant way to iterate over the runes in a |
Take these two benchmarks: https://play.golang.org/p/QI4BxUq8MGp
The first code is cleaner, more idiomatic, and easier to write/maintain. The second is much trickier, and I'm not even sure I wrote it correctly.
Lucky for us, the first tends to perform about the same or slightly better in terms of time:
However, as one can see, it still incurs an extra allocation. I'm not sure why that is - via
go test -gcflags=-m
, I can see./f_test.go:16:27: BenchmarkIdiomatic string(Input) does not escape
.We have optimized other common string conversion patterns, such as
switch string(byteSlice)
in 566e3e0, and I believe
someMap[string(byteSlice)]
was optimized too.Would it be possible to get rid of this allocation somehow? My knowledge of the runtime and compiler is a bit limited, but I'd think that it is possible.
As for a real use case - there's a few occurrences in the standard library that could be greatly simplified. For example, in
encoding/json
we'd remove ten lines of tricky code:However, that currently means a bad regression in speed and allocations:
I haven't investigated why my microbenchmark is faster when simpler, while json gets so much slower when made simpler.
Any input appreciated. cc @josharian @martisch @randall77 @TocarIP
The text was updated successfully, but these errors were encountered: