-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String should be officially immutable & "abc" === "abc" #22193
Comments
The way I want to implement this is vararg elements (similar to C variable length array elements). It can replace the special case for svec and string and also make strings more ABI comparible with C (the usefulness of that is a different question). With the ref def and ops in #21912 it should be possible to implement the surface API. |
You can still mutate strings on 0.6, just not by the previous
|
FWIW, doing this might actually make |
This confuses me a bit. Why should this work, but
not impact |
Yes, but that need not be the case in 1.0. Having genuinely immutable Strings has many advantages, and we're very well positioned to make such a change since it's already atypical to mutate them. There will always be some ways to mutate strings since you can always poke at memory, but as long as we make it sufficiently hard / put up dire enough warning signs, it's ok. |
Neither is defined behavior. |
@yuyichao ok, but what impact does it have? My interest is also security-related. What happens to that extra byte with respect to |
@yuyichao What command causes undefined behaviour here?
My point is, there is no |
There's no unsafe-stuff used so doing this shouldn't crash. |
Unrelated |
Is there any docs what is undefined behaviour then? Like modifying an array from a string. |
can you elaborate? If
|
Resizing the vector forced a re-allocation:
|
@mbauman that's really interesting; thanks. This is too:
|
That's what I mean. The byte pushed (and also the new vector) is unrelated to the string. |
Fun with mutations:
|
More fun:
(don't do this if you want to keep your julia session / terminal running.) |
Er, except for the modification of the array. I think the gc can get confused when you do that and end up freeing the wrong memory. So, word to the wise: don't try to write to a Vector generated from a String (unless you know for certain the Vector contains the only reference to the string) |
I'm pretty sure the GC is fine with that. I think I've also checked that the realloc one won't cause early free either. Modifying |
I thought we decided the other way? Well that's good at least. |
We could also hide the |
|
treat String as immutable in `===`. part of #22193
If we're saying that This would also be consistent with julia> f(collection) = for i in collection println(i) end
julia> f([1,2,3])
1
2
3
julia> f(1)
1
julia> f(["one", "two", "three"])
one
two
three
julia> f("one")
o
n
e should be ... ?
|
That is highly related to the discussion on whether Strings are scalars in |
Also we now have |
This bit me in WebSockets :( where we apply a mask to the byte representation of a String. Thank you @samoconnor for finding this. I never would have found the problem myself. Shifu 🙏 |
Do you have a take-away recommendation from that experience? I believe that in 0.7 you cannot mutate the memory backing a string without at some point doing something explicitly unsafe. |
My take away is that I am looking forward to v0.7 😊 At least there would have been a warning. @samoconnor already has an unsafe version HTTP.bytes("Foo") https://github.com/JuliaWeb/HTTP.jl/blob/master/src/IOExtras.jl#L15-L32 We should encourage use of that (or similar). No recommended changes from this experience except I'm looking forward to v0.7 😊 That was painful though. I literally sunk a few days into that, but it is all good now 👍 |
I can sympathize. One of the harder bugs I've ever tracked down involved a mutated string in Ruby where the mutator was deep in some obscure function call. |
Credit for HTTP.bytes goes to @quinnj. He came up with that as a work around when 0.7 started warning about future copying in the Vector{UInt8} constructor. |
As of 0.6, you can no longer do
"abc".data
and access a mutableVector{UInt8}
of the underlying bytes for a string. In 1.0 it would be desirable to haveString
object be truly immutable and have"abc" === "abc"
.The text was updated successfully, but these errors were encountered: