Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: []byte conversion to string causes unnecessary alloc #31506

Closed
cybrcodr opened this issue Apr 16, 2019 · 5 comments
Closed

cmd/compile: []byte conversion to string causes unnecessary alloc #31506

cybrcodr opened this issue Apr 16, 2019 · 5 comments

Comments

@cybrcodr
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.12 darwin/amd64

What did you do?

The following code in our repo incurs an additional alloc when type converting []byte to string --
https://go.googlesource.com/protobuf/+/d3f8f2d4122f4b739c70822f3aae4f82bf28cac9/internal/encoding/json/string.go#68

What did you expect to see?

I was hoping that our case would fall under the improvements mentioned in https://twitter.com/davidcrawshaw/status/1069634054448996354

What did you see instead?

pprof view showing allocation --

github.com/golang/protobuf/v2/internal/encoding/json.(*Decoder).parseString
internal/encoding/json/string.go

 Total:   1011.06MB  1011.06MB (flat, cum) 99.68%
     63            .          .           	} 
     64            .          .           	if in[0] != '"' { 
     65            .          .           		return "", 0, d.newSyntaxError("invalid character %q at start of string", in[0]) 
     66            .          .           	} 
     67            .          .           	in = in[1:] 
     68     275.51MB   275.51MB           	i := indexNeedEscape(string(in)) 
     69            .          .           	in, out := in[i:], in[:i:i] // set cap to prevent mutations 
     70            .          .           	for len(in) > 0 { 
     71            .          .           		switch r, n := utf8.DecodeRune(in); { 
     72            .          .           		case r == utf8.RuneError && n == 1: 
     73            .          .           			nerr.AppendInvalidUTF8("") 
     74            .          .           			in, out = in[1:], append(out, in[0]) // preserve invalid byte 
     75            .          .           		case r < ' ': 
     76            .          .           			return "", 0, d.newSyntaxError("invalid character %q in string", r) 
     77            .          .           		case r == '"': 
     78            .          .           			in = in[1:] 
     79            .          .           			n := len(in0) - len(in) 
     80     283.01MB   283.01MB           			return string(out), n, nerr.E 
     81            .          .           		case r == '\\': 
     82            .          .           			if len(in) < 2 { 
     83            .          .           				return "", 0, io.ErrUnexpectedEOF 
     84            .          .           			} 
     85            .          .           			switch r := in[1]; r { 
     86            .          .           			case '"', '\\', '/': 
     87     452.53MB   452.53MB           				in, out = in[2:], append(out, r) 
     88            .          .           			case 'b': 
     89            .          .           				in, out = in[2:], append(out, '\b') 
     90            .          .           			case 'f': 
     91            .          .           				in, out = in[2:], append(out, '\f') 
     92            .          .           			case 'n': 

@mdempsky mdempsky changed the title []byte conversion to string causes unnecessary alloc cmd/compile: []byte conversion to string causes unnecessary alloc Apr 16, 2019
@mdempsky
Copy link
Contributor

Minimal repro:

$ cat x.go
package p

func f(s []byte) int { return g(string(s)) }
func g(s string) int { return len(s) }

$ go tool compile -m -S x.go | grep CALL.*slicebytetostring
	0x0044 00068 (x.go:3)	CALL	runtime.slicebytetostring(SB)

$ go tool compile -m -l -S x.go | grep CALL.*slicebytetostring
	0x0044 00068 (x.go:3)	CALL	runtime.slicebytetostring(SB)

In these cases, we could use slicebytetostringtmp and avoid the allocation. However, detecting when that's safe is rather tricky.

In particular, we have to make sure that as long as the converted string value is alive that nothing can modify the []byte value memory, and we don't currently do any analysis of that sort.

@dsnet
Copy link
Member

dsnet commented Apr 17, 2019

nothing can modify the []byte value memory, and we don't currently do any analysis of that sort.

If nothing in the function has synchronization, then it is already unsafe according to the Go memory model for any external code to mutate the []byte.

@mdempsky
Copy link
Contributor

If nothing in the function has synchronization, then it is already unsafe according to the Go memory model for any external code to mutate the []byte.

That's true, but we don't currently have any way to track internal code that might mutate the []byte while the string value is valid.

For example:

var x = make([]byte, 1)

func f(s string) byte {
    x[0] = 1
    return s[0]
}

func g() {
    println(f(string(x))) // must print 0
}

Even knowing that f doesn't have internal synchronization, we can't optimize f(slicebytetostring(s)) into f(slicebytetostringtmp(s)).

@bradfitz
Copy link
Contributor

Dup of #2205, no?

@dsnet
Copy link
Member

dsnet commented Apr 17, 2019

Dup. Correct. I thought an issue existed. Couldn't locate it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants