Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix messaging around what is and isn't a "slice" AKA. stop using "slice" to mean &[T] #101353

Open
Ben-Lichtman opened this issue Sep 2, 2022 · 6 comments
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools

Comments

@Ben-Lichtman
Copy link
Contributor

Ben-Lichtman commented Sep 2, 2022

Slices - Why the messaging is wrong / confusing and needs to be clarified

In my experience teaching newbies about rust (mostly on the community discord and in person) there has always been a speed-bump in the learning process, and that's slices.

The way we talk about slices is confusing and ambiguous - we first (IMO incorrectly) teach that &[T] is called a slice, but then later when we introduce unsized types and reveal that actually [T] is a slice, but also continue to call &[T] a slice in extensive documentation.

Things become even more confusing when some documentation also uses the terms "shared slice" and "mutable slice", or defines a slice as a "view into a contiguous sequence" (which IMO conveys a sense of indirection?). Things that add to the confusion:

  • Owned slices such as Box<[T]> and even the rare Box<str>
  • [T] / str as a generic parameter (why is or isn't it &[T] / &str?).
  • How does [T] relate to [T; N] ie. (if &[T] is a slice then is &[T; N] an array?)

Due to the langage we use to teach, rust beginners must learn and then unlearn the terminology several times before they arrive at the correct nuances of how slices work and in what contexts we use what terms. I believe this should change.

While I believe using "slice" to mean &[T] informally is acceptable, I think having this language present in documentation does the community a disservice and makes everything more difficult than it needs to be.

How we could fix it

We need to focus on accurately teaching the language, and using the terminology consistently:

  • [T] is a slice - an unsized type that needs to be behind some form of pointer / reference / indirection
  • &[T] is a slice reference / slice ref - a fat pointer which includes the length of the slice.
  • &mut [T] should be called a mutable slice ref
  • &str - what should we call this? a "string ref" / "str ref" maybe? currently it is often called a "string slice" but that is inaccurate...
    These basics should not be conflated - especially in documentation. Make it clear which ones are slices and which ones are references. (yes, I know that it's more words to type out)
  • Introducing the terms "shared" and "unique", while accurate, are confusing since they are not used anywhere else
  • Calling &[T] a "shared slice" is confusing since it implies that the type is a kind of slice, when in fact it is a reference to a slice.
  • The documentation needs to be overhauled in several places to make this messaging accurate and consistent.
  • I think methods such as .as_slice() and .as_mut_slice(), while technically incorrect (they return slice references) should be fine to stay since it's still fairly clear what the user is getting back.

If we really want to continue calling &[T] a "slice" and &mut [T] a "mutable slice", then we must come up with a new name for a [T] to avoid ambiguous language.

Locations (not comprehensive)

  • Confusing language exists throughout the documentation
  • stdlib:
    • primitive slice
      • A slice is a "dynamically-sized view into a contiguous sequence"
      • They are "either mutable or shared" - neglecting the existance of owned slices
      • The type is for slice primitives, but the documentation generally describes slice references
    • std::slice
      • same documentation as for the slice primitive, but then additionally describes traits and methods which apply to the primitive type
    • std::slice::from_raw_parts returns a "slice" std::slice::from_raw_parts_mut returns a "mutable slice"
    • Nowhere in the documentation is an actual slice described, and the difference between slices and references to them is never made clear
  • the rust book - slice chapter
    • "Slices let you reference a contiguous sequence of elements in a collection rather than the whole collection"
    • "A slice is a kind of reference, so it does not have ownership" - incorrect
    • "array" is used interchangably with "slice" without definitions being clarified
    • generally uses "slice" to mean &[T]
  • the rust reference
    • Glossary
      • "A slice is dynamically-sized view into a contiguous sequence, written as [T].

        It is often seen in its borrowed forms, either mutable or shared. The shared slice type is &[T], while the mutable slice type is &mut [T], where T represents the element type." - This is the best description so far, but again, the reference type names are used interchangably with an actual slice type.

    • Slice types
      • "A slice is a dynamically sized type representing a 'view' into a sequence of elements of type T. The slice type is written as [T]"
      • "&[T]: a 'shared slice', often just called a 'slice'. It doesn't own the data it points to; it borrows it.
        &mut [T]: a 'mutable slice'. It mutably borrows the data it points to.
        Box<[T]>: a 'boxed slice'"

TL;DR: Using "slice" to mean &[T] is bad, confusing, and hard to learn.

Does anyone else think this is a problem? Please bikeshed your &[T] and &mut [T] names and other thoughts below :)

@Ben-Lichtman Ben-Lichtman added the A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools label Sep 2, 2022
@kpreid
Copy link
Contributor

kpreid commented Sep 3, 2022

In my opinion,

  • The documentation in primitive.slice.html is the most important thing to clarify, because it's the most common place people look for what should be a precise definition when they want clarity.

  • It isn't very harmful to say “slice” casually when talking about inputs and outputs of functions; it's routine to talk about operating on T when the operation takes a &T or &mut T, so this should be fine when T = [U] too. Just as long as it's not making any assertion that &[U] is the slice type.

  • "string slice" for str is also mostly harmless; it isn't ambiguous with something else that exists (unless you count [String]). The rate of confused beginners here is much lower than with [T] (not counting those who need borrowed/owned sized/unsized explained to them in the first place).

@the8472
Copy link
Member

the8472 commented Sep 3, 2022

I don't think it's wrong. The english language has this common pattern where a base word forms a large category covering many related things. That category may have a central example but also contain some more tangential things.
If you want to name something more specific you need to glue on some extra qualifiers. Some people use dashes to make it clear that something is a phrase-word like mutable-slice-ref-to-bytes, other languages use compound nouns.

It's like "tea" can mean a beverage made from camellia sinensis, the plant itself but also many kinds of herbal teas.

@QuineDot
Copy link

QuineDot commented Sep 3, 2022

Historical note: Before DSTs, &[T] were called slices and there was some transition period before [T] were called slices ("dynamically sized arrays", "unsized slices", et cetera). 1 2 3 4 and some prior work.

as_slice methods have existed since at least 2012. I don't really have a problem with the naming now or then; you can read it as something getting converted to a &[T] or you can read it as something getting converted to a [T]. The conversion just happens to take place behind a & in the latter interpretation.

That said, I agree that consistency and formality in the official documentation is better, and also call [T] the slice when being formal. The immediate jump to "pointer and length" on the primitive type page in particular has bugged me for some time. Seems those are pretty much the original docs.

@Ben-Lichtman
Copy link
Contributor Author

Aside, but your link "4" mentions calling &[T] a "borrowed slice", which I think is quite a good name and one that I didn't mention in the OP

@frank-king
Copy link
Contributor

I do think this is confusing, especially for some beginners. Here are some discussions about the slice type with my friends:

  • A: I think the length of &mut [i32] can be extended, but [T; N] has a fixed length that cannot be extended, am I right?
  • Me: Nop. You can mutate the elements in &mut [T], but its length cannot be changed. You can "change" the length by creating a new slice, but it's not the same slice as the original one.
  • A: Oh, I may have impractical expectations about [i32]. I know arrays, allocated on the stack, are only allowed to mutate the elements, but their lengths are fixed.
  • B: According to the Rust terms, [i32] is called a slice, and [i32; N] is an array.
  • Me: Right.
  • A: let arr = [1, 2, 3]; Does arr has the type [i32]?
  • Me: No, its type is [i32; 3].
  • A: Ok. Then, what is [i32]?
  • Me: A slice whose size is unknown in compile-time. Since Rust doesn't support unsized local yet, you can only see the reference or pointer types of slices, such as &[i32], &mut [i32], Box<[i32]>, etc.
  • A: Ok, I see. [T] is an unsized slice, we should use &[T], &mut [T], or Box<[T]> instead; and [T; N] is a sized array.
  • Me: Yes, and it is not necessary to say an "unsized" slice. Slices are always unsized.
  • (After a while)
  • A: But is slice a fat pointer? With an address and a length?
  • Me: No. What you mentioned is a slice reference, which is different from "slice".
  • A: Oh, I'm confused, but the Rust book says "slice is a kind of reference"

The Slice Type

Slices let you reference a contiguous sequence of elements in a collection rather than the whole collection.
A slice is a kind of reference, so it does not have ownership.

A dynamically-sized view into a contiguous sequence, [T]. Contiguous here means that elements are laid out so that every element is the same distance from its neighbors.
Slices are a view into a block of memory represented as a pointer and a length.
Slices are either mutable or shared. The shared slice type is &[T], while the mutable slice type is &mut [T], where T represents the element type.

  • Me: Yes, It calls &[T] the shared slice, and &mut [T] the mutable slice.
  • Me: I think slices used to name the &[T] type, this may not be a big problem until unsized locals are introduced.
  • A: Yes, that is what I think of slices. I think a slice is a fat pointer with an address and a length.
  • Me: But I think the definitions on The Rust Reference are more precise:

A slice is a dynamically sized type representing a 'view' into a sequence of elements of type T. The slice type is written as [T].
Slice types are generally used through pointer types. For example:

  • &[T]: a 'shared slice', often just called a 'slice'. It doesn't own the data it points to; it borrows it.
  • &mut [T]: a 'mutable slice'. It mutably borrows the data it points to.
  • Box<[T]>: a 'boxed slice'
  • Me: I don't think slice is a fat pointer. In my opinion, a slice is a list of elements that is contiguous in memory.

After this discussion, I think the docs of slices are kind of confusing. And would like to open an issue about the docs. Then I reached this related issue.

I tried my best to explain the difference between [T] and &[T] or &mut [T] to my friend, but since they use the same term "slice" behind, I found it really hard to help him distinguish these two types.

It may not be too bad to mix [T] with &[T] because [T] is rarely directly exposed to the user. But since custom DST is supported, such as this type,

struct Foo {
    sized: u32,
    dst: [u8],
}

It will be extremely confusing if we mix the [u8] type with &[u8].

@frank-king
Copy link
Contributor

I think the most confusing point may be: [T] as an unsized type, its size is unknown at the compile-time; however, &[T] and &mut [T] as a fat pointer, it has a fixed length. Here the unknown size v.s. the fixed length looks contradictory, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools
Projects
None yet
Development

No branches or pull requests

5 participants