-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guide: strings #15593
Guide: strings #15593
Conversation
details. Luckily, Rust has lots of tools to help us here. | ||
|
||
A **string** is a sequence of unicode codepoints encoded as a stream of UTF-8 | ||
bytes. All safely-created strings are guaranteed to be validly encoded UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically incorrect. It's a sequence of unicode scalar values. Notably, U+D800 is a codepoint, but it's not a scalar value.
For reference, we already talk about unicode scalar values in the documentation for std::char.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be changed in the std::str documentation, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably true. I haven't actually read that in a while. But let's focus on making a good strings guide, then we can go back and fix up the API documentation as appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I only make mention of it to document where I gained the wrong impression from, and to make sure that I change that as well.
@kud1ing There are many books on programming in various languages that cover topics one can get from the official docs, but people still buy the expensive books, because they're less dry. It's a good think that @steveklabnik wants to make the Guides like a book, there's also the Manual for reference. Or would we prefer something like the following as our only point of reference: |
Perhaps the chapter could focus more on helping the reader to choose the right string type in a given situation. That is the difficult part. (Perhaps add some text add the beginning of the chapter about slices) It could be nice to have text about concatenating strings, splitting strings and formatting strings. |
[Try in-browser](http://is.gd/orj55o) | ||
|
||
|
||
Like vector slices, string slices are simply a pointer plus a length. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you should mention here that a slice is not growable.
@nielsle that's my intention for the 'best practices' section. |
I've addressed the easy comments, gonna give some thought to the harder ones. |
@MatejLach: I am believer in "perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away". To me this especially applies to documentation, when i am actually busy doing something else other than reading documentation. I am not against collegial writing style, but i always wonder "if this was missing, what would be the reasons to add it? who would say: i want this because ... ?" |
I've squashed all those commits together, and added a best practice about comparing strings. What do we think? |
|
||
Like any Rust type, string slices have an associated lifetime. A string literal | ||
is a `&'static str`. A string slice can be taken as an argument to a function, | ||
in which case, it has the usual associated lifetime: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not happy with the phrasing here, "it has the usual associated lifetime". The goal of this sentence is to convey the idea that the lifetime can be omitted in a lot of cases, right? Omitting the lifetime is of course not limited to just function arguments, you can also say let x: &str = "foo"
.
Is there some other guide that uses a similar phrasing here that you're trying to reference? If not, I'd suggest perhaps something like
The
&str
type can be written without an explicit lifetime in many cases, such as in function arguments. In these cases the lifetime will be inferred:
although I feel like it would be good to explicitly correlate this with the inferred lifetimes of arbitrary &T
refs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that wording. No connection to anything else, just my own words. I'll change it to that after i'm done rebuilding...
Looks good overall. I would not be unhappy if it were committed as-is (except for the one mistaken reference to |
either kind of string into `foo` by using `.to_slice()` on any `String` you | ||
need to pass in, so the `&str` version is more flexible. | ||
|
||
### Comparions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Comparions/Comparisons/
?
Thanks for the close review, @kballard . Fixed all of that, including the nits :) |
|
||
Like vector slices, string slices are simply a pointer plus a length. This | ||
means that they're a 'view' into an already-allocated string: either a | ||
`&'static str` or a `String`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned before that a &str
could actually have been created from something that isn't a &str
or a String
, e.g. str::from_utf8()
creates it from &[u8]
. At the time I said I wasn't sure if it was worth mentioning.
Well, this still bugs me. But I think we can still avoid complicating it merely by changing the word "either" to "such as" (and replacing the colon with a comma). This reads
This means that they're a 'view' into an already-allocated string, such as a
&'static str
or aString
.
Ok, I think those last two nitpicks are it. r=me |
@kballard both fixed. |
I decided to change it up a little today and hack out the beginning of the String guide. Strings are different enough in Rust that I think they deserve a specific guide, especially for those who are used to managed languages. I decided to start with Strings because they get asked about a lot in IRC, and also based on discussions like this one on reddit: http://www.reddit.com/r/rust/comments/2ac390/generic_string_literals/ I blatantly stole bits from our other documentation on Strings. It's a little sparse at current, but I wanted to start somewhere. I am not exactly sure what should go in "Best Practices," and would like the feedback from the team on this. Specifically due to comments like this one: http://www.reddit.com/r/rust/comments/2ac390/generic_string_literals/citmxb5
UTF-8 bytes. All strings are guaranteed to be validly-encoded UTF-8 sequences. | ||
Additionally, strings are not null-terminated and can contain null bytes. | ||
|
||
Rust has two main types of strings: `&str` and `String`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather write "two string-related types" instead of "two main types of strings" because the way it is now suggests to me that one can use &str like any other "value type" (me with my C++ hat on). But maybe that's just me.
The guide should maybe add a note to highlight the
|
@l0kod could you please add this to the 'improving the strings guide' ticket? Thanks. |
cc #15994 |
I decided to change it up a little today and hack out the beginning of the String guide. Strings are different enough in Rust that I think they deserve a specific guide, especially for those who are used to managed languages.
I decided to start with Strings because they get asked about a lot in IRC, and also based on discussions like this one on reddit: http://www.reddit.com/r/rust/comments/2ac390/generic_string_literals/
I blatantly stole bits from our other documentation on Strings. It's a little sparse at current, but I wanted to start somewhere.
I am not exactly sure what should go in "Best Practices," and would like the feedback from the team on this. Specifically due to comments like this one: http://www.reddit.com/r/rust/comments/2ac390/generic_string_literals/citmxb5