Ch8 edits after technical review #450

carols10cents · 2017-02-18T04:44:32Z

Pretty small edits here.

@eddyb had some comments that I don't agree with, and I want to see what @steveklabnik thinks and discuss them in here, so I've pulled them out of the docx and linked to their context:

Regarding the term hash map:

“map” or “associative map” would be more appropriate, HashMap and BTreeMap are two compromises based on what the user might need, neither is more fundamental (unlike, say, Vec<(K, V)>, which is simple but also very slow).

I think "hash map" is the term that will be most familiar to the most people from most languages, and it's also literally the name of the type. I think wanting to associate keys with values and be able to look them up by key is more fundamental than wanting a map that also does those things plus is sorted.

In the same spot, regarding our choice of vector, string, and hash map:

I’d like to mention maps and sets at the same time, but sets are harder because mathematical sets are usually unordered but that’s not true of BTreeSet, so it’d have to be phrased around the optimized operation (e.g. “a set allows us to quickly check if it contains a certain element”)

I think adding set to this chapter is too much, and I reach for these three WAY more often than I reach for a set.

Regarding why slicing into a string's bytes by index is allowed even though accessing by index is not:

Indexing the bytes is O(1) so I’m not sure what this is trying to say.

The text I put here no longer makes sense to me either, which means I'm still confused. I think I was trying to contrast slicing to accessing an index: slicing doesn't have the expectation of performance that index access does, so even if it is fast, we aren't constrained by the expectation of performance like index access is, which is why it was taken away? I am still unsatisfied by this...

Regarding crates providing grapheme cluster functionality:

Could we link to them, or at least provide a name? IIRC some of the maintainers are Rust libs team members, so they’re kind of “official”?

We decided we weren't going to do that unless they're under rust-lang or rust-lang-nursery and scheduled to have a 1.0 version by the time the book's printed.

Regarding the example of collecting into a hash map:

Do note that this will collect references. into_iter would be better, if not for the fact that the shortest form is then asymmetrical (teams.into_iter().zip(initial_scores).collect())

Into_iter is too much complexity/distraction to introduce here, more than the type ending up being slightly different would.

The rest of your comments were excellent and I agree wholeheartedly, @eddyb! ❤️

eddyb · 2017-02-18T09:03:19Z

To be fair, I have no experience with either Perl or Ruby and I've heard they both use "hash" to refer to unordered associative maps implemented using hashing.
And Java uses HashMap, just like us, but plenty of other languages (PHP, JS, Python and C++, to name a few) don't even have "hash" in the name of their equivalent.
As for which is more fundamental: neither take your own key types unless you derive the right traits (Hash vs Ord), and neither of those implies the other, they're orthogonal.

To reiterate: I am fine with only talking about HashMap as an example of a "map", since it's more widely used AFAIK, but "hash map" almost makes it sound like all maps are more specialized "hash maps".
And the comment about sets wasn't meant for a full explanation in the chapter, just a mention at the start.
Frankly, I wouldn't mind keeping the current list if the reader was told they should maybe go to https://doc.rust-lang.org/stable/std/collections/ to see the full list & tradeoff guidelines we have there.

eddyb · 2017-02-18T09:05:34Z

we aren't constrained by the expectation of performance like index access is, which is why it was taken away?

From my POV this is wrong. Also, indexing gave a single byte back IIRC (because indexing gives back a reference that the compiler places an implicit deref on, to give you an lvalue, so char couldn't work).

eddyb · 2017-02-18T09:08:56Z

Into_iter is too much complexity/distraction to introduce here, more than the type ending up being slightly different would.

Sadly, I agree. This is where zip(teams, initial_scores).collect() would work wonderfully 😞.

carols10cents · 2017-02-18T15:38:59Z

From my POV this is wrong. Also, indexing gave a single byte back IIRC (because indexing gives back a reference that the compiler places an implicit deref on, to give you an lvalue, so char couldn't work).

So do you have a better way to explain why you can slice bytes of a str with [0..9] w/e, but you're not allowed to index bytes of a str with [5]?

eddyb · 2017-02-18T15:43:29Z

@carols10cents We check that slices fall on complete UTF-8 sequences (which is O(1) because of how UTF-8 was designed), i.e. what you get is indeed a valid substring (as far as codepoints are concerned), but byte indexing can you give a part of an UTF-8 sequence, which is much less useful in the general case.

I might be able to figure out where the actual decision was taken and the original arguments, if you want.
EDIT: Apparently it was rust-lang/rust#15085 but that doesn't link back to the actual discussion.

carols10cents · 2017-02-18T15:48:05Z

To reiterate: I am fine with only talking about HashMap as an example of a "map", since it's more widely used AFAIK, but "hash map" almost makes it sound like all maps are more specialized "hash maps".

Ok, I'm fine with adding "which is an implementation of the more general data structure called a map" or something, would that resolve this comment?

And the comment about sets wasn't meant for a full explanation in the chapter, just a mention at the start.
Frankly, I wouldn't mind keeping the current list if the reader was told they should maybe go to https://doc.rust-lang.org/stable/std/collections/ to see the full list & tradeoff guidelines we have there.

We do say:

Each kind of collection has different capabilities and costs, and choosing an appropriate one for the situation you’re in is a skill you’ll develop over time. In this chapter, we’ll go over three collections which are used very often in Rust programs: [...]
There are more specialized variants of each of these data structures for particular situations, but these are the most fundamental and common.

so what if I change this to say "Each kind of collection has different capabilities and costs, and choosing an appropriate one for the situation you’re in is a skill you’ll develop over time. In this chapter, we’ll go over three collections which are used very often in Rust programs: [...]
To learn about the other kinds of collections provided by the standard library, see the documentation at https://doc.rust-lang.org/stable/std/collections/. "

I think this will also make the discussion of "fundamental" moot?

eddyb · 2017-02-18T15:49:31Z

@carols10cents Both of those are great minimal changes, thank you!

carols10cents · 2017-02-18T15:51:03Z

@carols10cents We check that slices fall on complete UTF-8 sequences (which is O(1) because of how UTF-8 was designed), i.e. what you get is indeed a valid substring (as far as codepoints are concerned), but byte indexing can you give a part of an UTF-8 sequence, which is much less useful in the general case.

I might be able to figure out where the actual decision was taken and the original arguments, if you want.

Slicing panics if you try and slice somewhere that isn't a character boundary. So why not allow indexing but panic if it's not a single byte character?

I'd love to see the original discussion, I've tried to find it before and couldn't :-/

eddyb · 2017-02-18T15:56:59Z

Oh, original issue for removing str indexing is rust-lang/rust#12710, and there are meeting minutes.

carols10cents · 2017-02-18T16:17:17Z

burntsushi says we should say:

it's not clear what the return type of string indexing should be, and it is often a bad idea to index into a string, so we dissuade you from doing it by asking you to do more work if you really need it

mbrubeck · 2017-02-18T17:56:13Z

Slicing panics if you try and slice somewhere that isn't a character boundary. So why not allow indexing but panic if it's not a single byte character?

Even for indexes pointing to single-byte characters, the Output type would need to be u8 rather than char because the String can't return a reference to a 32-bit char that doesn't exist. So it’s literally impossible, given the current Index traits, to implement "character" indexing even for a subset of strings, only byte indexing (which is redundant with string.as_bytes()) or substring indexing (which is better expressed using ranges).

it's not clear what the return type of string indexing should be, and it is often a bad idea to index into a string, so we dissuade you from doing it by asking you to do more work if you really need it

I like this but would change “do more work” to “be more specific.”

carols10cents · 2017-02-18T18:37:24Z

Manishearth says:

byte indexing is not what folks would expect
codepoint indexing is O(n), which is a hidden cost
It is good to discourage treating strings as a list of codepoints anyway.
Slicing is useful because slicing can sprout off of iteration
most string manips need iteration. If you're iterating you don't need indexing. You do need slicing, to say something like "I have read till this point and wish to stop" or "I wish to discard the part of the string I have read"

&Manishearth> carols10cents: the expected workflow is "iterate via char_indices, then slice using indices you have noted down"
carols10cents> Manishearth: so why not "get valid char_indices, then index for individual chars"
carols10cents> like nothing enforces that workflow
&Manishearth> carols10cents: because with char_indices you already have the individual chars?
carols10cents> so why enforce indexes
rkruppe> carols10cents: because you generally want substrings, not chars
rkruppe> btw it's not just char_indices, you can get valid indices for slicing from basically every text manipulation API, from .find() to regexes

&Manishearth> carols10cents: "It's rare to directly index/slice a string without iterating through it first. If iterating through it, you don't need to index; but you still need to form slices out of known char indices."

nox> carols10cents: Make the string a bathtub full of water, explain that you can't get a single drop from it, only a section of the whole volume.
1:30 PM Second sentence: Drop pun unintended.

carols10cents · 2017-02-18T18:38:59Z

nox> carols10cents: Please forget about the bathtub,
nox> you can only take a slice of a baguette.
nox> Focus on that.
nox> 🥖

steveklabnik · 2017-02-19T01:43:54Z

So, I agree with @carols10cents on basically every point in the original post; I also agree with @eddyb in cases, but made the choices I did for the reasons that @carols10cents said.

There's also a lot of other discussion here too, just wanted to finally actually mention that.

steveklabnik

looks great, one extremely minor nit

steveklabnik · 2017-02-19T01:44:33Z

nostarch/chapter08.md

@@ -142,12 +142,12 @@ element past the end of the vector to be a fatal error that should crash the
 program.

 When the `get` method is passed an index that is outside the array, it will
-return `None` without `panic!`ing. You would use this if accessing an element
-beyond the range of the vector will happen occasionally under normal
+return `None` without causing a `panic!` call. You would use this if accessing


I would remove "call" here, it feels weird to me, idk

yeah i'm just trying to get rid of the word that's half code, half not code. what if i just change it to, all not code, "panicking"?

carols10cents · 2017-02-19T18:30:15Z

@steveklabnik i made a few more changes to actually resolve the discussions, want to take a look at the last commit or do you still approve?

matthewjasper · 2017-02-19T19:07:16Z

nostarch/chapter08.md

-with a range to create a string slice containing particular bytes:
+Because it's not clear what the return type of string indexing should be, and
+it is often a bad idea to index into a string, Rust dissuades you from doing so
+by asking you to be more specific if you really need it. The way you can me


steveklabnik · 2017-02-20T00:03:33Z

nostarch/chapter08.md

@@ -1,7 +1,7 @@

 [TOC]

-# Fundamental Collections
+# Common Collections


carols10cents added 10 commits February 17, 2017 15:07

Chapter 8 as received from nostarch after tech review

f91b0ce

Addressing some of eddy's comments, merging in upstream bug fixes

3a2cc4c

Merge upstream changes into nostarch md

5a38a2f

Fancy quotes, removing comments, and line wrap

052a3c8

Reword so code formatting isn't in the same word as not-code

f55bac1

Fix/add references to other chapters

e6b9736

Fix inconsistent capitalization/markup

1642138

Be consistent about unicode scalar values over characters

46f784c

Mention there are hasher-providing crates available

7293684

Propagate edits to src md

0421fbb

carols10cents requested a review from steveklabnik February 18, 2017 04:44

Update dictionary

ce95aa4

carols10cents added 2 commits February 18, 2017 11:23

whoops you saw nothing

c0502b9

you saw nothing part 2

894ade2

steveklabnik approved these changes Feb 19, 2017

View reviewed changes

Resolving remaining comments after discussion

0d229cc

matthewjasper reviewed Feb 19, 2017

View reviewed changes

me -> be

4dc0001

steveklabnik approved these changes Feb 20, 2017

View reviewed changes

nostarch/chapter08.md

@@ -1,7 +1,7 @@

[TOC]

# Fundamental Collections

# Common Collections

Copy link

Member

steveklabnik Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

carols10cents merged commit dba8499 into master Feb 20, 2017

carols10cents deleted the ch8-tr branch February 20, 2017 03:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ch8 edits after technical review #450

Ch8 edits after technical review #450

carols10cents commented Feb 18, 2017

eddyb commented Feb 18, 2017

eddyb commented Feb 18, 2017

eddyb commented Feb 18, 2017

carols10cents commented Feb 18, 2017

eddyb commented Feb 18, 2017 •

edited

Loading

carols10cents commented Feb 18, 2017

eddyb commented Feb 18, 2017

carols10cents commented Feb 18, 2017

eddyb commented Feb 18, 2017

carols10cents commented Feb 18, 2017

mbrubeck commented Feb 18, 2017

carols10cents commented Feb 18, 2017

carols10cents commented Feb 18, 2017

steveklabnik commented Feb 19, 2017

steveklabnik left a comment

steveklabnik Feb 19, 2017

carols10cents Feb 19, 2017

carols10cents commented Feb 19, 2017

matthewjasper Feb 19, 2017

steveklabnik Feb 20, 2017

Ch8 edits after technical review #450

Ch8 edits after technical review #450

Conversation

carols10cents commented Feb 18, 2017

eddyb commented Feb 18, 2017

eddyb commented Feb 18, 2017

eddyb commented Feb 18, 2017

carols10cents commented Feb 18, 2017

eddyb commented Feb 18, 2017 • edited Loading

carols10cents commented Feb 18, 2017

eddyb commented Feb 18, 2017

carols10cents commented Feb 18, 2017

eddyb commented Feb 18, 2017

carols10cents commented Feb 18, 2017

mbrubeck commented Feb 18, 2017

carols10cents commented Feb 18, 2017

carols10cents commented Feb 18, 2017

steveklabnik commented Feb 19, 2017

steveklabnik left a comment

Choose a reason for hiding this comment

steveklabnik Feb 19, 2017

Choose a reason for hiding this comment

carols10cents Feb 19, 2017

Choose a reason for hiding this comment

carols10cents commented Feb 19, 2017

matthewjasper Feb 19, 2017

Choose a reason for hiding this comment

steveklabnik Feb 20, 2017

Choose a reason for hiding this comment

eddyb commented Feb 18, 2017 •

edited

Loading