Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: strings: a string factory to convert bytes to string #32593

Closed
junchih opened this issue Jun 13, 2019 · 8 comments
Closed

proposal: strings: a string factory to convert bytes to string #32593

junchih opened this issue Jun 13, 2019 · 8 comments

Comments

@junchih
Copy link

junchih commented Jun 13, 2019

Problem:

Each time converting a bytes to string involves a memory allocation, caused by the design immutable string and mutable bytes.

Investigation:

Logically, a constant string which embedded into binaries got limited usage. And the definition of string-processing functions normally got a type string as its argument type, rather than a bytes. The converting has widely occurred, and was driven by the widely usage of strings generated at program runtime.

For example, the benchmark at this Issue from other repository shows that json.Unmarshal are generating large number of memory fragments. Even the 3rd-part library didn't work well. Considering Golang is widely as a backend language, and those kind of calling has been put on the hot path.

Solution:

Simply add a string factory to package strings, exported one function New([]byte) string which converts bytes to string with the memory allocation optimised, and a concurrent-less struct Factory for the internal usage of Goroutines.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/182057 mentions this issue: strings: build a string factory for safely and malloc limited converting bytes to string

@martisch
Copy link
Contributor

martisch commented Jun 13, 2019

As bytes are mutable there needs to be at least one allocation (unless it memorizes and there are multiple identical strings/substrings) or pools memory for a general function to produce a string from bytes. There already is https://golang.org/pkg/strings/#Builder.

How is the new strings factory better than strings.Builder and what are the tradeoffs?

If its avoiding allocations by reserving a large memory pool than a single string can potentially pin a huge amount of memory which is likely not advisable for general usage.

@junchih
Copy link
Author

junchih commented Jun 13, 2019

As bytes are mutable there needs to be at least one allocation (unless it memorizes and there are multiple identical strings/substrings) or pools memory for a general function to produce a string from bytes. There already is https://golang.org/pkg/strings/#Builder.

How is the new strings factory better than strings.Builder and what are the tradeoffs?

Sorry for the wrong work stream, I put the PR #32594 to mean to be the code draft for the further talking.

If its avoiding allocations by reserving a large memory pool than a single string can potentially pin a huge amount of memory which is likely not advisable for general usage.

In #32594 , strings.Factory means to be a goroutine-locally object, please avoid the function strings.New, since I totally agree Keith Randall's consideration in this comment, and still trying to make an answer for that. If I can't fix that concern, I'll remove strings.New in the future before requiring merging.

And for your original question, I just gave out a function func NewFactoryWithPoolSize(size int) *Factory for the performance optimisation of our callers. Did that answer your concern? Since the caller should know which size could be the average size of the fragments on the hot path. And they could put fourfold size to the generator, such like NewFactoryWithPoolSize(4 * avgSize). I'll organize these answer into the description of this function before merging.

@josharian
Copy link
Contributor

I’m not sure I follow the request, but #5160 may be relevant. There’s also a trick that can be done in user code for the right kinds of setups: https://go-review.googlesource.com/c/164961

@rsc
Copy link
Contributor

rsc commented Jun 25, 2019

In the CL, strings.Factory.New is making a copy of the argument; so does string(x).
One possible advantage might be the lack of fragmentation for very small strings, but string(x) takes care of that too.
Another possible advantage is a lack of fragmentation at the end of larger strings.
But now keeping a reference to any one of these strings will keep a reference to the entire allocation,
which is often problematic (it's a memory leak!) and really not something that any standard library routines should do or encourage.

If you really need this one-big-allocation-backing-many-strings, it would be easier just to
allocate a big []byte b, fill it, do s := string(b) (copies b), and then reslice s.
Or use strings.Builder to fill it and then there's no copy during the string(b).

This does not fit well at all in the standard library. I suggest we decline this proposal.

@rsc
Copy link
Contributor

rsc commented Jun 25, 2019

I filed #32779 for the more limited JSON issue.

@rsc
Copy link
Contributor

rsc commented Aug 27, 2019

Suggested declining this on June 25 (#32593 (comment)) and there have been no comments since then. As such, this seems like a likely decline.

Leaving open for a week for final comments.

@ianlancetaylor
Copy link
Member

Marked this last week as likely decline w/ call for last comments (#32593 (comment)).
Declining now.

@golang golang locked and limited conversation to collaborators Sep 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants