Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide: strings #15593

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions src/doc/guide-strings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
% The Strings Guide

# Strings

Strings are an important concept to master in any programming language. If you
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the part "Strings are an important concept to master in any programming language." out, it's not adding much information.

come from a managed language background, you may be surprised at the complexity
of string handling in a systems programming language. Efficient access and
allocation of memory for a dynamically sized structure involves a lot of
details. Luckily, Rust has lots of tools to help us here.

A **string** is a sequence of unicode scalar values encoded as a stream of
UTF-8 bytes. All strings are guaranteed to be validly-encoded UTF-8 sequences.
Additionally, strings are not null-terminated and can contain null bytes.

Rust has two main types of strings: `&str` and `String`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather write "two string-related types" instead of "two main types of strings" because the way it is now suggests to me that one can use &str like any other "value type" (me with my C++ hat on). But maybe that's just me.


## &str

The first kind is a `&str`. This is pronounced a 'string slice.' String literals
are of the type `&str`:

```{rust}
let string = "Hello there.";
```

Like any Rust type, string slices have an associated lifetime. A string literal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're muddying two kinds of lifetimes together here. Surely, there are two lifetimes involved with a string slice (or any other type with ampersand at the front). The lifetime of the variable holding the pointer and length, and the lifetime of the memory pointed at by the slice. With "Like any Rust type" you seem to refer to the former lifetime while the lifetime you actually want to talk about is the "2nd kind" that only applies to references and alike. I'm not sure how much value there is in special casing &str as a funny string type in the introduction when it really is a reference to a slice of a string like the ampersand suggests. There is some consistency here. str is almost the same as [u8] except for the fact that it guarantees a valid UTF-8 encoding.

is a `&'static str`. A string slice can be written without an explicit
lifetime in many cases, such as in function arguments. In these cases the
lifetime will be inferred:

```{rust}
fn takes_slice(slice: &str) {
println!("Got: {}", slice);
}
```

Like vector slices, string slices are simply a pointer plus a length. This
means that they're a 'view' into an already-allocated string, such as a
`&'static str` or a `String`.

## String

A `String` is a heap-allocated string. This string is growable, and is also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&str could just as well refer to a heap allocated string. The difference is in the ownership.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying String is heap allocated does not imply that &str is not.

guaranteed to be UTF-8.

```{rust}
let mut s = "Hello".to_string();
println!("{}", s);

s.push_str(", world.");
println!("{}", s);
```

You can coerce a `String` into a `&str` with the `as_slice()` method:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but I think you just misused the word "coerce". I thought that it refers to a form of implicit conversion in the context of types and programming languages. But correct me if I'm wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this was (somewhat) discussed previously: #15593 (comment)


```{rust}
fn takes_slice(slice: &str) {
println!("Got: {}", slice);
}

fn main() {
let s = "Hello".to_string();
takes_slice(s.as_slice());
}
```

You can also get a `&str` from a stack-allocated array of bytes:

```{rust}
use std::str;

let x: &[u8] = &[b'a', b'b'];
let stack_str: &str = str::from_utf8(x).unwrap();
```

## Best Practices

### `String` vs. `&str`

In general, you should prefer `String` when you need ownership, and `&str` when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is a little late for my taste.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you're looking for best practices, I expect you to click on the best practices header. And it would be weird to put this before the explanation of what they are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think String is for ownership and &str is not? It's because these types are defined like that. But you did not tell the reader about it so far. The first time "ownership" comes up is at this line. And that's what I was referring to. It's too late. Ownership is part of what defines the distinction between String and &str. The best practice pretty much falls out of that given that cloning Strings is much more expensive than lending them via slices (if cloning them would be as cheap as copying an int we would probably not bother with &str in lots of cases).

Perhaps a good way of introducing the topic is to say something along the lines of "If you need a variable type for a variable to hold a string value just like an int variable holds an integer value, you probably want the String type. One big reason though to use the &str type in addition is that copying string values is expensive. In many cases one does not need to copy a string value. In those cases it suffices to just refer to string values, to borrow them. And that's what &str is for" or something like that.

you just need to borrow a string. This is very similar to using `Vec<T>` vs. `&[T]`,
and `T` vs `&T` in general.

This means starting off with this:

```{rust,ignore}
fn foo(s: &str) {
```

and only moving to this:

```{rust,ignore}
fn foo(s: String) {
```

If you have good reason. It's not polite to hold on to ownership you don't
need, and it can make your lifetimes more complex. Furthermore, you can pass
either kind of string into `foo` by using `.as_slice()` on any `String` you
need to pass in, so the `&str` version is more flexible.

### Comparisons

To compare a String to a constant string, prefer `as_slice()`...

```{rust}
fn compare(string: String) {
if string.as_slice() == "Hello" {
println!("yes");
}
}
```

... over `to_string()`:

```{rust}
fn compare(string: String) {
if string == "Hello".to_string() {
println!("yes");
}
}
```

Converting a `String` to a `&str` is cheap, but converting the `&str` to a
`String` involves an allocation.

## Other Documentation

* [the `&str` API documentation](/std/str/index.html)
* [the `String` API documentation](std/string/index.html)
8 changes: 4 additions & 4 deletions src/libcollections/str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,10 @@ other languages.

# Representation

Rust's string type, `str`, is a sequence of unicode codepoints encoded as a
stream of UTF-8 bytes. All safely-created strings are guaranteed to be validly
encoded UTF-8 sequences. Additionally, strings are not null-terminated
and can contain null codepoints.
Rust's string type, `str`, is a sequence of unicode scalar values encoded as a
stream of UTF-8 bytes. All strings are guaranteed to be validly encoded UTF-8
sequences. Additionally, strings are not null-terminated and can contain null
bytes.

The actual representation of strings have direct mappings to vectors: `&str`
is the same as `&[u8]`.
Expand Down