-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: provide std lib pkg Clone function to copy a string efficiently without memory sharing #40200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Why do you want this? Are you worried about extracting a substring so the main string can be collected? Because if that's the use case, I don't think confusing the simple existing model ( |
Yes that is one use case as written in the proposal. Another application I have seen is to make a string that is safe to use going forward since the orginal string was created through an unsafe process of using a memory region e.g. mmaped that should be released again. I fully understand if the proposal is getting rejected on grounds that this is something that the garbage collector can solve in the future without any new API or that users can implement as a helper themselfs. I have seen the issue come up before but never a proposal. Even if rejected I think it will serve as a good reference point cite that this wont be implemtented/support or that these details are to low level to base an API on. It is also in context of me experimenting with the compiler to optimze some string->byte->string... conversion chains which could unless explicitly prevented make the 'strings([]byte(s))' version a noop that some Go programs may rely on to actually make a copy due to either memory concerns or having used unsafe to create the string in the first place. I understand that issue is not yet reality and can be defered until that is the case to offer an easy escape hatch to refactor then "broken" code. |
What about using the following code? func CopyString(s string) string {
return string(append([]byte(nil), s...))
} The compiler may optimize it to a single string copy and a single memory allocation. |
If the outcome is to change the compiler to recognise a pattern for low overhead copy I do not see the advantage of |
Since concatenation creates a new string, the simplest copy-this construct would be |
@networkimprov Right now the compiler can just turn @martisch described two use cases. The first is to avoid pinning a large string. As @robpike suggests, that is something we can look into implementing in the garbage collector. I don't know how hard that would be--probably pretty hard--but it does seem like that should be the first step. @martisch also suggests the case of a string defined in mmap'ed memory that needs to move to the regular heap. I'm probably missing something but I think getting a string in mmap'ed memory must involve some sort of unsafe operation, so I'm not sure how interesting this case is. I expect that we can move a string into the heap using an unsafe operation. So maybe we need to decide whether the garbage collector option is possible for unpinning larage strings, and maybe we need to decide whether there is an unsafe operation that will do the copy operation. E.g., see the discussion in #19367. |
Why not let us explicitly copy a string, by whatever API/syntax? Why complicate the runtime to fix a single use case of a general purpose stdlib or language feature? |
Avoiding the unnecessary pinning of large strings is a general problem that would be nice to solve for the general case. It would mean that some programs would get better memory usage without requiring people to understand this detailed issue and requiring them to know about a relatively obscure library function. And if we are able to fix that general problem, then we don't need the library function. This is just my opinion, of course, others may feel differently. |
It's easy to do yourself, although of course there is no guarantee the compiler won't optimize this away. It doesn't today. https://play.golang.org/p/gLXMWXD1SNL
|
Note that I agree that deciding to have the garbage collector not pin extra memory together with an unsafe operation to copy memory will make the string copy function uneccessary and are better more general solution if they are going to be choosen. |
Every GC'ed language has this problem. I don't think that we are going to solve it in the GC when none of the rest have. When Java was faced with this problem they made the (in my opinion incredibly unwise) decision to change the substring operation to start making copies. That is, Java x.substring(i, j) used to be like Go's x[i:j], sharing memory with the parent, but starting in Java 1.7 it makes a copy. This kind of change makes formerly linear algorithms go quadratic, so I'm stunned they thought this was a good idea. But if nothing else it shows how much people needed a solution to the unwanted sharing pinning data. Given the likely absence of a magic bullet for the foreseeable future, it does seem reasonable to have something like strings.Copy(string) string, with appropriate documentation. In the past I've written something that does s[:1] + s[1:] for non-empty strings and returns "" for empty strings. Lots of other ways to do it too (like Rob's), but I think it is telling that we all have our versions of this. |
To avoid some confusion of what |
@martisch, I don't think ( |
I think the problem there is using the
So I don't think “copying an external string into the Go heap” is an appropriate use-case for this function, because I don't agree that external strings should be represented as type |
@bcmills while I agree on the high level I would like to note that on an existing ecosystem level libraries that do e.g. proto handling that may want to use arenas for decoding dont use new types but use type string and they will be hard to change. |
Another use case for
|
#45038 includes a strings.Clone that should address this. |
Will the supposed implementation return the original s if string s occupies its whole hosting memory block? |
@go101 It might. |
If we are talking about the proposed |
strings.Clone must copy always. It might be passed a []byte that has been unsafely converted to string, intended to launder it into a real string. Given that, I think this issue can be closed as a duplicate of #45038. |
This proposal is a duplicate of a previously discussed proposal, as noted above, |
Change https://golang.org/cl/345849 mentions this issue: |
The new strings.Clone function copies the input string without the returned cloned string referencing the input strings memory. goarch: amd64 cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz name time/op Clone-8 24.2ns ± 2% name alloc/op Clone-8 48.0B ± 0% name allocs/op Clone-8 1.00 ± 0% Update #45038 Fixes #40200 Change-Id: Id9116c21c14328ec3931ef9a67a2e4f30ff301f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/345849 Trust: Martin Möhrmann <martin@golang.org> Run-TryBot: Martin Möhrmann <martin@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joe Tsai <joetsai@digital-static.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
@martisch I suggested 2 alternative implementations not using unsafe to implement the required functionality as comments on https://golang.org/cl/345849 |
Updated 21th April 2021 to use Clone instead of Copy as name: see #45038 (comment)
Goal
Provide a fast, allocation efficient and guaranteed way to copy a string without the new string sharing memory with the old string, that does not rely on code assuming
string([]byte(s))
works for this.Proposal
Add a
Clone(s string) string
function (not a builtin) to a std lib package that is guaranteed to return a string with the same content as the input string, does not share memory (for the backing) array with the input string and minimises memory allocated to do so.Note that it this function does not need to be in the strings package but could be located in a package more fitting for its special use character.
Use cases
Existing alternatives:
string([]byte(s))
is currently not guaranteed by the language standard and thereby all conforming Go compilers to not just return the same backing array. While the current Go Gc compiler does not elide the double copy a future Go compiler could optimize this to a noop. If the compiler becomes smart enough to recognise the safe versions of string to byte to string conversions (cmd/compile: skip slicebytetostring when argument doesn't escape #6714) it would be fragile to try to forbid some of them again. Being explicit about the intent of the copy semantics seems cleaner and a better intent communication.(s + " ")[:len(s)]
is as far as I understand not guaranteed by the language standard and thereby all conforming Go compilers to not just return the same backing array.strings.Builder
can be used to only cause one allocation currently for the copy but has no guarantees to not be worse in allocations or even avoid copying if only one string is written and retrieved again.Implementation is simple:
https://go-review.googlesource.com/c/go/+/242259/1/src/strings/copy.go
The use of unsafe headers in the strings library is already present in the
strings.Builder
code.Downsides
strings.Clone
special to keep semantics intact.s1 = s2
) and application in contexts where it is not needed.For those reasons and others I expect this proposal will be easily rejected or voted down. That is fine as I think it is still useful to have some documented discussion about agreement whether such functionality or implementation details about strings are not ok to be exposed.
The text was updated successfully, but these errors were encountered: