-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String Guide suggestions #15994
Comments
Anything about indexing should be minimal and should be very strongly suggesting that you shouldn’t be doing this in the first place. Almost always, a string should be opaque data. Iteration is just about the only way you should ever do such things, and even iteration should very seldom be done. Alternatives that may serve certain purposes are As for comparison, UTF-8 has the convenient property that bitwise comparison yields the same answers as codepoint comparison. Of course there is still the question of composed versus decomposed characters and so on and so forth… the simple summary is that you really shouldn’t be doing comparisons, either. People want to do all these operations on strings, but it seems to me that the more experienced you get, the more you realise that these sorts of operations are all unsound and should never really be done. |
Agreed. This is a good place to explain that. |
In #15997 I tried to rearrange the sections to introduce String before &str. That makes it easier to introduce &str as a view into String. (I agree that indexing should be discouraged, but indexing makes it easy to explain how &str is different from String) This PR is mostly meant as an experiment. Feel free to close it if you are already editing the chapter or if you are heading in a different direction. |
The string guide should have some common use cases and the rust-ish (rusty? oxidi-shous?) way described. My case is this: I want to take a string (e.g. a url) and use it to do something not too complex (e.g. authenticate to the AWS S3 api). This involves taking the url, and deciding based on the url which of the two available formats will be used, and returning the string that will be used to determine the signature of the request. This means some slicing and dicing. Coming from python/ruby/go/clojure (even C) the easiest answer is to split the string and compare to known values (e.g. does the hostname bit of the URL start with "s3.amazonaws.com") which lends itself naturally to a match. The odd part is that I pass in one kind of string (an &str) , and get another kind out (a String) where I need to be familiar with a whole different set of traits vs. &str. My understanding is that I should prefer String types, and I can see this being a common idiom - so much so that there should probably be some agreement on how something like this could be made more obvious: fn bucket_name_from_path <'a>(path: &'a str) -> String {
let parts: Vec<&str> = path.split_str("/").collect();
return match parts.get(0).slice_from(0) {
"s3.amazonaws.com" => parts.get(1).to_string(),
_ => name_from_vhost_style(*parts.get(0))
}
}
fn name_from_vhost_style <'a>(vhostname: &'a str) -> String {
let hostname_parts: Vec<&str> = vhostname.split_str(".").collect::<Vec<&str>>();
let bucket_parts = hostname_parts.slice(0, hostname_parts.len() - 2);
return bucket_parts.connect("");
} I would like to have documented where the convention should be to place type conversions via to_str() and collect() etc. It would be nice to just be able to say that e.g. I should just convert &str strings to String and document which operations on a String are similar to common string operations in other languages (comparisons, splitting, joining, tokenizing, etc.), explain the slice types and how to operate with them (and why they exist) and just overall make it so that there is a clear path to doing common things the easy way. |
Would it be a good idea to discuss |
@samdoshi it might. I know nothing about it. |
How come the strings guide doesn't mention the char type (utf32) at all? ;) |
Strings are UTF8, not UTF32. |
I know, but that makes it even more confusing and in need of explanation. Why is an str a u8 slice, rather than chars, and why IS there a char that's 32-bit, but not part of string, etc.? ;) I get it, at a low level: char is a 32-bit value, capable of representing all (most?) unicode codepoints as a fixed-length binary value. But it's not clear why they called that char, why there's no "byte" type, why string is essentially a vector of bytes, but already converted from bytes to unicode (rather than using stronger typing, and calling it char8, for example). The low-level stuff is understandable, but the high-level design / reasoning, and how to use char along with all this... that stuff's not so clear. |
It might be a good idea to bring up Str, which makes writing APIs that are agnostic to the type of string they receive better. |
From the current state of the guide:
Insight into examples of both would be helpful.
I'd like to know what you'd think about this language:
Just below that it says:
That reads as a bit confusing to me. If I understand it, would this preserve the meaning and provide some more clarity?
|
Yes, that means the same thing. I feel they're about equally clear, but if you feel that it's more... |
Maybe there's a better phrasing? I feel like from the perspective of the un-initiated reader, the extra information helps by describing the mechanism and the context. |
Adding a section on c_str and FFI would be good as well. |
The guide should maybe add a note to highlight the
|
In general, the guide should encourage traits as function parameter instead of types. |
…=alexcrichton A few steps toward #15994
I think that most of this has been tackled, if there are specific improvements, please open new issues with one per issue. |
…r=Veykril fix: Err for comma after functional update syntax Error message copied from rustc, https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=20aeedb2db504c4e4ced54b665e761d6. Fixes rust-lang#15989.
From reddit: http://www.reddit.com/r/rust/comments/2bpenl/confused_by_the_purpose_of_str_and_string/cj7rt1u
I like all of these, and they should be in the guide.
The text was updated successfully, but these errors were encountered: