Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String Guide suggestions #15994

Closed
steveklabnik opened this issue Jul 26, 2014 · 17 comments
Closed

String Guide suggestions #15994

steveklabnik opened this issue Jul 26, 2014 · 17 comments

Comments

@steveklabnik
Copy link
Member

From reddit: http://www.reddit.com/r/rust/comments/2bpenl/confused_by_the_purpose_of_str_and_string/cj7rt1u

  • add a section for indexing - i.e. how do I compare just the first 3 characters of two strings? Or fetch the character in position 3 in the string? Or iterate through the characters in the string?
  • comparing - i.e. how do I know if one string is greater than another - and on what basis is the ordering done (binary value of the strings, or based on character set)?
  • applying regular expressions to strings, is &str preferred over String for this?

I like all of these, and they should be in the guide.

@huonw huonw changed the title String Guide suggetsions String Guide suggestions Jul 26, 2014
@chris-morgan
Copy link
Member

Anything about indexing should be minimal and should be very strongly suggesting that you shouldn’t be doing this in the first place. Almost always, a string should be opaque data. Iteration is just about the only way you should ever do such things, and even iteration should very seldom be done.

Alternatives that may serve certain purposes are begins_with and ends_with, and graphemes() will need to be mentioned.


As for comparison, UTF-8 has the convenient property that bitwise comparison yields the same answers as codepoint comparison. Of course there is still the question of composed versus decomposed characters and so on and so forth… the simple summary is that you really shouldn’t be doing comparisons, either.

People want to do all these operations on strings, but it seems to me that the more experienced you get, the more you realise that these sorts of operations are all unsound and should never really be done.

@steveklabnik
Copy link
Member Author

Agreed. This is a good place to explain that.

@nielsle
Copy link
Contributor

nielsle commented Jul 26, 2014

In #15997 I tried to rearrange the sections to introduce String before &str. That makes it easier to introduce &str as a view into String. (I agree that indexing should be discouraged, but indexing makes it easy to explain how &str is different from String)

This PR is mostly meant as an experiment. Feel free to close it if you are already editing the chapter or if you are heading in a different direction.

@pcn
Copy link
Contributor

pcn commented Jul 27, 2014

The string guide should have some common use cases and the rust-ish (rusty? oxidi-shous?) way described. My case is this:

I want to take a string (e.g. a url) and use it to do something not too complex (e.g. authenticate to the AWS S3 api). This involves taking the url, and deciding based on the url which of the two available formats will be used, and returning the string that will be used to determine the signature of the request.

This means some slicing and dicing. Coming from python/ruby/go/clojure (even C) the easiest answer is to split the string and compare to known values (e.g. does the hostname bit of the URL start with "s3.amazonaws.com") which lends itself naturally to a match. The odd part is that I pass in one kind of string (an &str) , and get another kind out (a String) where I need to be familiar with a whole different set of traits vs. &str. My understanding is that I should prefer String types, and I can see this being a common idiom - so much so that there should probably be some agreement on how something like this could be made more obvious:

fn bucket_name_from_path <'a>(path: &'a str) -> String {
    let parts: Vec<&str> = path.split_str("/").collect();
    return match parts.get(0).slice_from(0) {
        "s3.amazonaws.com" =>  parts.get(1).to_string(),
        _ => name_from_vhost_style(*parts.get(0))
    }
}
fn name_from_vhost_style <'a>(vhostname: &'a str) -> String {
    let hostname_parts: Vec<&str> = vhostname.split_str(".").collect::<Vec<&str>>();
    let bucket_parts = hostname_parts.slice(0, hostname_parts.len() - 2);
    return bucket_parts.connect("");
}

I would like to have documented where the convention should be to place type conversions via to_str() and collect() etc. It would be nice to just be able to say that e.g. I should just convert &str strings to String and document which operations on a String are similar to common string operations in other languages (comparisons, splitting, joining, tokenizing, etc.), explain the slice types and how to operate with them (and why they exist) and just overall make it so that there is a clear path to doing common things the easy way.

@samdoshi
Copy link

Would it be a good idea to discuss std::str::MaybeOwned here? When it's appropriate to use it and when it isn't.

@steveklabnik
Copy link
Member Author

@samdoshi it might. I know nothing about it.

@lee-b
Copy link

lee-b commented Jul 30, 2014

How come the strings guide doesn't mention the char type (utf32) at all? ;)

@steveklabnik
Copy link
Member Author

Strings are UTF8, not UTF32.

@lee-b
Copy link

lee-b commented Jul 30, 2014

I know, but that makes it even more confusing and in need of explanation. Why is an str a u8 slice, rather than chars, and why IS there a char that's 32-bit, but not part of string, etc.? ;)

I get it, at a low level: char is a 32-bit value, capable of representing all (most?) unicode codepoints as a fixed-length binary value. But it's not clear why they called that char, why there's no "byte" type, why string is essentially a vector of bytes, but already converted from bytes to unicode (rather than using stronger typing, and calling it char8, for example). The low-level stuff is understandable, but the high-level design / reasoning, and how to use char along with all this... that stuff's not so clear.

@reem
Copy link
Contributor

reem commented Jul 30, 2014

It might be a good idea to bring up Str, which makes writing APIs that are agnostic to the type of string they receive better.

@pcn
Copy link
Contributor

pcn commented Aug 5, 2014

From the current state of the guide:

In general, you should prefer String when you need ownership, and &str when you just need to borrow a string.

Insight into examples of both would be helpful.

This means starting off with this:

fn foo(s: &str) {

and only moving to this:

fn foo(s: String) {

I'd like to know what you'd think about this language:

This means starting off with a string slice like this:

fn foo(s: &str) {

and only moving to this:

fn foo(s: String) {

If you have good reason such as <some examples in code somewhere and a description of why those examples are good uses of using String?>. 

Just below that it says:

Furthermore, you can pass either kind of string into foo by using .as_slice() on any String you need to pass in, so the &str version is more flexible.

That reads as a bit confusing to me. If I understand it, would this preserve the meaning and provide some more clarity?

Furthermore, the version of foo that accepts a &str argument can be seen as more flexible because it can be passed an &str or a String.  How is that possible?  A String has the .as_slice() trait, which presents it to the function as a string slice, so you can invoke foo(some_String.as_slice()) if it accepts an &str.

@steveklabnik
Copy link
Member Author

Yes, that means the same thing. I feel they're about equally clear, but if you feel that it's more...

@pcn
Copy link
Contributor

pcn commented Aug 5, 2014

Maybe there's a better phrasing? I feel like from the perspective of the un-initiated reader, the extra information helps by describing the mechanism and the context.

@steveklabnik
Copy link
Member Author

Adding a section on c_str and FFI would be good as well.

http://doc.rust-lang.org/std/c_str/index.html

@l0kod l0kod mentioned this issue Aug 18, 2014
@l0kod
Copy link
Contributor

l0kod commented Aug 18, 2014

The guide should maybe add a note to highlight the Str trait who can be used as a generic parameter if the function doesn't care about owning (or not) the string. This way, it's possible to use &str or String, which might be convenient:

fn foo<T: Str>(msg: T) {
    std::io::stdio::print(msg.as_slice());
}
foo("hello");
foo(" world".to_string());

@l0kod
Copy link
Contributor

l0kod commented Aug 18, 2014

In general, the guide should encourage traits as function parameter instead of types.

bors added a commit that referenced this issue Sep 10, 2014
@steveklabnik
Copy link
Member Author

I think that most of this has been tackled, if there are specific improvements, please open new issues with one per issue.

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants