-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unsafe: add StringData, String, SliceData #53003
Comments
Thanks, but I don't see an actual proposal here. If you want to discuss ideas, please use golang-nuts. The proposal process should be for a proposal. That is, suggest some specific functions that we should introduce. Thanks. |
Sure, good point. I've edited my comment with an actual proposal of |
One of the main use cases of |
CC @mdempsky |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Filed #53079. Perhaps we should undeprecate them for Go 1.19, since we don't have a replacement for all valid use cases yet. Thanks. |
This proposal has been added to the active column of the proposals project |
FWIW, the proposed functions closely parallel the That package also provides best-effort mutation detection, since Go |
I've seen that package, and the one thing I'd not do is the mutation detection (which you gate behind the I was thinking to link the package in my original writeup, since it is further evidence of the use case. |
At the very least, I think it's important for the documentation for the proposed functions to explicitly call out the lifetime issues, especially for |
To fully replace For reading, we can already extract all the elements of a slice, via For writing, we can create a new slice setting all the elements, via So to me that suggests, in package unsafe, // StringData returns a pointer to the bytes of a string.
// The bytes must not be modified; doing so can cause
// the program to crash or behave unpredictably.
func StringData(string) *byte
// String constructs a string value from a pointer and a length.
// The bytes passed to String must not be modified;
// doing so can cause the program to crash or behave unpredictably.
func String(*byte, int) string These functions should remain meaningful even if we somehow change the representation of slices or strings in the future. The restrictions on changing the bytes are unfortunate, but the fact is that people are doing these kinds of transformations today. Omitting these functions from the unsafe package doesn't mean that Go programs won't do them, it just means that they will do in ways that are sometimes even less safe. It should be feasible to add a dynamic detector for any modifications to these bytes. This could perhaps be enabled when using the race detector. This would not be perfect but would detect egregious misuses. The functions suggested above would then be written as func StringToBytes(s string) []byte {
return unsafe.Slice(unsafe.StringData(s), len(s))
}
func BytesToString(b []byte) string {
return unsafe.String(&b[0], len(b))
} (To be clear, the reverse is also possible: we can write |
It does seem like unsafe.String and unsafe.StringData match unsafe.Slice a bit better and are more fundamental operations |
I'd like to suggest we reconsider the original proposal from #19367: to add new Concretely, I propose adding:
and allowing conversions between Converting an invalid N.B., my original #19367 proposal also allowed conversions between |
My concern about What can we do with |
After thinking about it more, I'm inclined to agree that |
It sounds like we've converged on considering unsafe.StringData, unsafe.String, and unsafe.SliceData. |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
Change https://go.dev/cl/449537 mentions this issue: |
…Data For #53003. Change-Id: If5d76c7b8dfcbcab919cad9c333c0225fc155859 Reviewed-on: https://go-review.googlesource.com/c/go/+/449537 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com>
Both the spec and the unsafe package documentation are now done. Closing. |
|
I still miss that import "unsafe"
func String2ByteSlice(str string) []byte {
if str == "" {
return nil
}
return unsafe.Slice(unsafe.StringData(str), len(str))
}
func ByteSlice2String(bs []byte) string {
if len(bs) == 0 {
return ""
}
return unsafe.String(unsafe.SliceData(bs), len(bs))
} Those edge cases with empty strings/slices make current API very verbose as you have to check for those edge cases. Could those two functions be added to standard library? On a related note, should examples like |
Hi @mitar, FWIW, I don't know that the standard library would want to make that more convenient. Note that Go 1.22 will have some nice enhancements in this area, including recognizing in many cases when a []byte is not modified after a string->[]byte conversion to enable a safe zero-copy conversion (issue #2205). Here is an example (with comparison of 1.21 vs. tip via godbolt): func f() {
s := "hello"
x := []byte(s) // zero-copy string->[]byte conversion
_ = x
} |
@mitar I don't think these checks are needed.
So it's perfectly safe to feed the returned pointer directly to |
@database64128 package main
import "unsafe"
func main() {
{
var s = ""
println(s == "") // true
var x = unsafe.Slice(unsafe.StringData(s), len(s))
println(x == nil) // true
}
{
var s = "go"[:0]
println(s == "") // true
var x = unsafe.Slice(unsafe.StringData(s), len(s))
println(x == nil) // false
}
} It might satisfy the needs of some cases, but not all. The situations of converting from zero-length slices to strings are comparatively simpler. package main
import "unsafe"
func ByteSlice2String(bs []byte) string {
if len(bs) == 0 {
return ""
}
return unsafe.String(unsafe.SliceData(bs), len(bs))
}
func main() {
var x = make([]byte, 0, 1024)
var s = unsafe.String(unsafe.SliceData(x), len(x))
println(s == "") // true
println(*(*uintptr)(unsafe.Pointer(&s))) // <a non-zero addr>
var s2 = ByteSlice2String(x)
println(s2 == "") // true
println(*(*uintptr)(unsafe.Pointer(&s2))) // 0
} |
Why originally proposed functions StringToBytes and BytesToString are not added to unsafe package, so people stop writing various implementations with questionable correctness again and again is beyond my comprehension. |
Because |
If we are running with a PID namespace, we have a shim process by default. We have code that sets the name of this process to `sinit` by: * Overwriting the value of argv[0] * Using PR_SET_NAME Both are necessary as PR_SET_NAME only affects the value from PR_GET_NAME. In this code, unsafe.StringHeader is now deprecated. Rewrite using the pattern that takes an unsafe.Slice from an unsafe.StringData to get the raw bytes for the string os.Args[0]. See: golang/go#53003 (comment)
If we are running with a PID namespace, we have a shim process by default. We have code that sets the name of this process to `sinit` by: * Overwriting the value of argv[0] * Using PR_SET_NAME Both are necessary as PR_SET_NAME only affects the value from PR_GET_NAME. In this code, unsafe.StringHeader is now deprecated. Rewrite using the pattern that takes an unsafe.Slice from an unsafe.StringData to get the raw bytes for the string os.Args[0]. See: golang/go#53003 (comment)
If we are running with a PID namespace, we have a shim process by default. We have code that sets the name of this process to `sinit` by: * Overwriting the value of argv[0] * Using PR_SET_NAME Both are necessary as PR_SET_NAME only affects the value from PR_GET_NAME. In this code, unsafe.StringHeader is now deprecated. Rewrite using the pattern that takes an unsafe.Slice from an unsafe.StringData to get the raw bytes for the string os.Args[0]. See: golang/go#53003 (comment)
If we are running with a PID namespace, we have a shim process by default. We have code that sets the name of this process to `sinit` by: * Overwriting the value of argv[0] * Using PR_SET_NAME Both are necessary as PR_SET_NAME only affects the value from PR_GET_NAME. In this code, unsafe.StringHeader is now deprecated. Rewrite using the pattern that takes an unsafe.Slice from an unsafe.StringData to get the raw bytes for the string os.Args[0]. See: golang/go#53003 (comment) Signed-off-by: Dave Dykstra <2129743+DrDaveD@users.noreply.github.com>
If we are running with a PID namespace, we have a shim process by default. We have code that sets the name of this process to `sinit` by: * Overwriting the value of argv[0] * Using PR_SET_NAME Both are necessary as PR_SET_NAME only affects the value from PR_GET_NAME. In this code, unsafe.StringHeader is now deprecated. Rewrite using the pattern that takes an unsafe.Slice from an unsafe.StringData to get the raw bytes for the string os.Args[0]. See: golang/go#53003 (comment) Signed-off-by: Dave Dykstra <2129743+DrDaveD@users.noreply.github.com>
) Closes #1147. The Godoc of `reflect.StringHeader` says [1]: "the Data field is not sufficient to guarantee the data it references will not be garbage collected, so programs must keep a separate, correctly typed pointer to the underlying data." And the Godoc of `unsafe.Pointer` says [2]: "A uintptr is an integer, not a reference. ... Even if a uintptr holds the address of some object, the garbage collector will not update that uintptr's value if the object moves, nor will that uintptr keep the object from being reclaimed." The replacement `unsafe.StringData` is a more correct way to get the pointer to the string data. The original proposal can be seen in golang/go#53003 (comment). [1]: https://pkg.go.dev/reflect#StringHeader [2]: https://pkg.go.dev/unsafe#Pointer Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Now that
reflect.StringHeader
andreflect.SliceHeader
are officially deprecated, I think it's time to revisit adding function that satisfy the reason people have used these types. AFAICT, the reason for deprecation is thatreflect.SliceHeader
andreflect.StringHeader
are commonly misused. As well, the types have always been documented as unstable and not to be relied upon.We can see in Github code search that usage of these types is ubiquitous. The most common use cases I've seen are:
[]byte
tostring
string
to[]byte
Data
pointer field for ffi or some other niche useThe first use case can also commonly be seen as
*(*string)(unsafe.Pointer(&mySlice))
, which is never actually officially documented anywhere as something that can be relied upon. Under the hood, the shape of a string is less than a slice, so this seems valid per unsafe rule (1), but this is all relying on undocumented behavior. The second use case is commonly seen as*(*[]byte)(unsafe.Pointer(&string))
, which is by-default broken because the Cap field can be past the end of a page boundary (example here, in widely used code) -- this violates unsafe rule (1).Regardless of the thought that people should never rely upon these types, people do, and they do so all over. People also rely on invalid conversions because Go has never made this easy. Part of the discussion on #19367 was about all the ways that people misuse these types today. These conversions are small tricks that can alleviate memory pressure and improve latencies and CPU usage in real programs. The use cases are real, and Go provides just enough unsafe and buggy ways of working around these problems such that now there is a large ecosystem of technically invalid code that just so happens to work.
Rather than consistently saying "don't use this",
go vet
ing somewhat, and then ducking all responsibility for buggy programs, Go should provide actual safe(ish) APIs that people can rely on in perpetuity. New functions can live inunsafe
and have well documented rules around their use cases, and then Go can finally document what to do when people want this common escape hatch.Concrete proposal
The following APIs in the unsafe package:
eliminating, because realistically a person can just dofunc DataPointer[T ~string|~[]E, E any](t T) unsafe.Pointer
&slice[0]
(although a corresponding analogue does not exist for strings)I think
unsafe.Slice
covers the use case for converting between slices of different types, although I'm not 100% sure what the use case ofunsafe.Slice
is.The text was updated successfully, but these errors were encountered: