Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some work on collections #137

Merged
merged 34 commits into from
Sep 27, 2016
Merged

some work on collections #137

merged 34 commits into from
Sep 27, 2016

Conversation

steveklabnik
Copy link
Member

I started fleshing out some stuff about data structures. I feel pretty good about this, with some caveats discussed below.

Also, I would really like to hear what @carols10cents, @aturon, and @jonathandturner (think about showing the internal representation of Vec like this; we haven't seen raw pointers, but I really like the idea of giving people an understanding at this level. I'm not sure if this pacing is correct, like, maybe the details of how insert() works should go at the end, rather than in the middle?

There are two ways this PR needs more work, beyond all that:

  1. I feel like I should put more examples in Vec, but wanted to give you an idea of the style I'm shooting for here before worrrying too much about exactly what should be included. Thoughts welcome! For one, I was thinking about showing off enums in vectors as a way of storing multiple types of elements...
  2. I fleshed out very very roughly what I'd want to cover for the other chapters as well, It's basically "show off CRUD", though I want to consider just dropping "delete" to be a reminder sentence to the vector version, rather than re-explaining Drop each time. Plus, each one has their own little tweaks: hashmap has entry, string will have the utf-8 stuff.

@carols10cents
Copy link
Member

So yeah, as someone who wants to know more about how to use things like Vectors in my code, but not really care so much about how they work under the hood, this isn't my favorite approach ever. BUT! Knowing what I know now, since I've acquired some rust experience, I think by adding a little bit of cheese to the broccoli you could get past-me to eat my vegetables and I would be better off for it.

I knew about the double-capacity reallocation behavior of Vec in the abstract but didn't think about it too much until these two experiences:

  • I was trying to optimize the performance of some code I was working on, I wasn't using Vec::with_capacity, and I saw a good chunk of time in my profiling results was being spent in Vec allocation. Switching to use Vec::with_capacity helped a WHOLE lot. I like that this chapter explains why that is, and as someone using Vec, this is super relevant, but the part about with_capacity is kind of a little aside at the end. If I made a function that solves a realish problem where you know the capacity of the Vec you'll need but not the values when you start (to show why you couldn't use vec!) and a flame graph of its performance using Vec::new() and Vec::with_capacity(), do you think that would be a good addition?
  • When I was working on rewriting zopfli in Rust, I found that the C version literally contains a define to do what Vec does. Now, I was already pretty horrified by C even though I don't know it intimately, but I was even more horrified during this eye-opening discovery/realization that every C library has to define and maintain this behavior themselves. I was very thankful that Rust not only provides the implementation of Vec reallocation for me, but that the borrow checker provides the safety of using Vec because of the way it's implemented. I love the demonstration of this in the "Reading elements of vectors" section-- could we add to the intro some allusion that this payoff of getting safe, performant, storage with a convenient API (maybe contrast with C?) will become clear to us by the end of the section? I think knowing that's coming would keep me going.

And one small thought:

  • Should we call this chapter "basic collections" or just "collections"? I know you mentioned kind of wanting to stay away from basic/advanced dichotomy... if you want to convey that this isn't going to cover all the collections available in std, maybe instead of "basic" say "foundational"? "common"? "staple"? "essential"? "fundamental"?

@alilleybrinker
Copy link

@carols10cents, perhaps "common collections," differentiating on how often collections are used in real-world programs.

@steveklabnik
Copy link
Member Author

If I made a function... do you think that would be a good addition?

Yeah, I think this might be real great!

could we add to the intro some allusion that this payoff of getting safe, performant, storage with a convenient API (maybe contrast with C?) will become clear to us by the end of the section?

👍

And yeah, "common" or "staple" both sound good to me.

@carols10cents
Copy link
Member

Yay!

Almost forgot that I made some tiny edits while reading, I just pushed them to this branch for you.

I'm going to add an issue to remind me to make that function and flame graphs soon-- I'll let you take care of the intro, unless you'd like me to give it a try!

v.push(8);
```

Since these numbers are `i32`s, Rust can infer the type of the vector, so we

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they always i32 for simple numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an integer which doesn't have any bounds to infer the type by is resolved as i32, yes.

@sophiajt
Copy link

👍 to @carols10cents comments.

Carol's knee-jerk about show me how to use Vec vs telling me how it works. I think we should spend a lot more time on how to use it, and then move the "let's open the hood and eat our broccoli" to later in the book (or possibly to a blog post or other venue).

There may be ways to show some of that data, maybe not quite all of the steps you show. When I was looking through the chapter, I was thinking that I'd much rather see pretty pictures. Maybe that's a good cheese for the broccoli where you can get some of the vitamins in.

@carols10cents
Copy link
Member

On the call, @aturon asked @steveklabnik about a case in which knowing the internals of Vec helped him to understand using a Vec, and he gave a great example of holding a reference to the 2nd element of the vec and not being able to add a new element to the end at the same time, since the memory might be reallocated-- just writing this down for you because i think it's awesome!

@carols10cents carols10cents removed their assignment Jul 27, 2016
@steveklabnik steveklabnik force-pushed the data-structures branch 2 times, most recently from f9aa969 to 3d90c48 Compare August 26, 2016 19:49
@steveklabnik
Copy link
Member Author

Okay, @carols10cents , I think a draft is done. The hashmap bit is.... uninspired? We should try to brainstorm on how to jazz it up a bit. But the basics are all in place, I think.

@steveklabnik
Copy link
Member Author

I was talking with @ashleygwilliams about what stuff to add to the hashmap section:

  • iterating over keys/values
  • forshadowing the hash trait

@ashleygwilliams
Copy link
Member

also: where are the values? are they copied, referenced? how does equality work?

['न', 'म', 'स', '्', 'त', 'े', ' ']
```

There are seven of them, and the last one isn't even visible! Finally, if you
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveklabnik I'm not really sure what you're trying to illustrate with the space on the end... the way you talk about it here ("the last [char] isn't even visible!", "and there's still that empty character on the end") makes it sound like something weirder than a regular ol' space? I think it's distracting from the main grapheme cluster/character point, mind if I take it out?


However.

Sometimes, indexing the bytes of a string is useful. So while you can't use `[]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is... unsatisfying. Like... I understand the pieces of this decision, namely:

  • People naïvely try to get chars by indexing a string and then make their programs not do what they expected in the face of multibyte characters, and Rust wants to try to prevent that
  • string slices are a starting byte and a number of bytes indexed into a String

but like, together, with the explanation "this thing is bad but this other thing that amounts to a trivial workaround is useful so we disallowed the bad thing but allowed the useful trivial workaround" is not very... satisfying? convincing that the language has a cohesive design?

There might not be a satisfying explanation here, and Rust just might be inconsistent here and everyone has come to terms with it. In that case, I feel like we should say "Yes, Rust is being inconsistent because the problems that occur with indexing directly into an array were deemed to outweigh the benefits of allowing it, but that did not hold for slicing because... X".

I am going to go looking for discussions along these lines, I bet there's a real humdinger of a bikeshed around here somewhere that I don't remember seeing.

@carols10cents
Copy link
Member

Ok. I think I'm done with this chapter. Anyone have time to review in the next few days?

- [Vectors]()
- [Strings]()
- [`HashMap<K, V>`]()
- [Essential Collections](ch08-00-essential-collections.md)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh i know we already talked about this, but I would really prefer something other than "essential." How about "fundamental"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YOU DID THIS 57fd1a5

(but yeah fundamental is fine)

- [Essential Collections](ch08-00-essential-collections.md)
- [Vectors](ch08-01-vectors.md)
- [Strings](ch08-02-strings.md)
- [HashMaps](ch08-03-hashmaps.md)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two words, probably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hnnnggg like everywhere then

@@ -0,0 +1,18 @@
# Essential Collections
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to change this here too


Before we can dig into those aspects, we need to talk about what exactly we
even mean by the word 'string'. Rust actually only has one string type in the
core language itself: `&str`. We talked about these string slices in Chapter 4:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth *ing string slices?

// ... etc
```

There are crates available to get grapheme clusters from `String`s.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add 'on crates.io'?

@@ -0,0 +1,251 @@
## HashMaps

The last of our essential collections is the *HashMap*. A `HashMap<K, V>`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

darn essences

### Updating a HashMap

#### Overwriting a Value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two headings with no words inside?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. i can put some words in there


#### Overwriting a Value

If we insert a key and a value, then insert that key with a different value, the value associated with that key will be replaced. Even though this code calls `insert` twice, the HashMap will only contain one key/value pair, since we're inserting with the key `1` both times:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

word wrap this line :)

@steveklabnik
Copy link
Member Author

This chapter is so much better now 😄 ❤️

Okay, so one last thing that we didn't include that might be cool: should we show off how to use enums to get multiple things into a collection?

@carols10cents
Copy link
Member

Okay, so one last thing that we didn't include that might be cool: should we show off how to use enums to get multiple things into a collection?

Oooooh good call. I'm into it.

@steveklabnik
Copy link
Member Author

i'm not totally sure. I guess it's like "vector" vs Vector, "hash map" vs
HashMap. idk

On Mon, Sep 26, 2016 at 5:53 PM, Carol (Nichols || Goulding) <
notifications@github.com> wrote:

@carols10cents commented on this pull request.

In src/SUMMARY.md #137:

@@ -31,10 +31,10 @@
- Controlling visibility with pub
- Importing names with use

-- Basic Collections

hnnnggg like everywhere then


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#137, or mute the thread
https://github.com/notifications/unsubscribe-auth/AABsiu1SvycU-kllJAxgZg9ltJoLUMLlks5quD7cgaJpZM4JQOUP
.

@carols10cents
Copy link
Member

No more essences. And "hash map" looked really weird at first but I think I'm getting used to it now.

Back over to you! 🏓

the vector would hold a type that would cause errors with the operations we
performed on the vector. Using an enum plus a `match` where we access elements
in a vector like this means that Rust will ensure at compile time that we
always handle every possible case.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last thing, and then let's :shipit:

We should mention "if we always know what the set of types are, an enum works. if we don't, a trait object works, and we'll learn about those later" (but in better words)

@carols10cents
Copy link
Member

check me!!

@steveklabnik steveklabnik merged commit 4a2bc80 into master Sep 27, 2016
@steveklabnik steveklabnik deleted the data-structures branch September 27, 2016 15:33
@steveklabnik
Copy link
Member Author

🎊

@carols10cents
Copy link
Member

yayayayayayyyyy!!!!

Using an enum for storing different types in a vector does imply that we need
to know the set of types we'll want to store at compile time. If that's not the
case, instead of an enum, we can use a trait object. We'll learn about those in
Chapter XX.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this meant to be Chapter 25.2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been rearranging the later chapters a lot, so we're not sure which chapter it's going to be yet.

same way as `println!`, but instead of printing the output to the screen, it
returns a `String` with the contents. This version is much easier to read than
all of the `+`s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this mention how format! doesn't take ownership (unlike the prior examples)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe! Can you elaborate on why mentioning that would be helpful to you at this point? Which prior examples are you referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the first example of concatenation there was an aside about String + &str taking ownership of the first part (which might be an extra point against doing the multiple + given potential precedence oddities) but there's nothing about how format! takes & (I went and looked it up in the docs/code

basically it's because ownership is brought up during the previous example but not during this it just felt a little strange.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'll think on this. I think I was assuming that since we said that format! is like println! here, and the reader has already had some experience with println!, that it should be clear. But apparently it's not! I'm also going to create a new bug report since this PR has been merged already, so this doesn't get lost. Thank you!! ❤️

AbrarNitk pushed a commit to FifthTry/rust-book that referenced this pull request Jun 2, 2021
AbrarNitk pushed a commit to FifthTry/rust-book that referenced this pull request Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants