Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ch8 edits after technical review #450

Merged
merged 15 commits into from
Feb 20, 2017
Merged

Ch8 edits after technical review #450

merged 15 commits into from
Feb 20, 2017

Conversation

carols10cents
Copy link
Member

Pretty small edits here.

@eddyb had some comments that I don't agree with, and I want to see what @steveklabnik thinks and discuss them in here, so I've pulled them out of the docx and linked to their context:

“map” or “associative map” would be more appropriate, HashMap and BTreeMap are two compromises based on what the user might need, neither is more fundamental (unlike, say, Vec<(K, V)>, which is simple but also very slow).

I think "hash map" is the term that will be most familiar to the most people from most languages, and it's also literally the name of the type. I think wanting to associate keys with values and be able to look them up by key is more fundamental than wanting a map that also does those things plus is sorted.

  • In the same spot, regarding our choice of vector, string, and hash map:

I’d like to mention maps and sets at the same time, but sets are harder because mathematical sets are usually unordered but that’s not true of BTreeSet, so it’d have to be phrased around the optimized operation (e.g. “a set allows us to quickly check if it contains a certain element”)

I think adding set to this chapter is too much, and I reach for these three WAY more often than I reach for a set.

Indexing the bytes is O(1) so I’m not sure what this is trying to say.

The text I put here no longer makes sense to me either, which means I'm still confused. I think I was trying to contrast slicing to accessing an index: slicing doesn't have the expectation of performance that index access does, so even if it is fast, we aren't constrained by the expectation of performance like index access is, which is why it was taken away? I am still unsatisfied by this...

Could we link to them, or at least provide a name? IIRC some of the maintainers are Rust libs team members, so they’re kind of “official”?

We decided we weren't going to do that unless they're under rust-lang or rust-lang-nursery and scheduled to have a 1.0 version by the time the book's printed.

Do note that this will collect references. into_iter would be better, if not for the fact that the shortest form is then asymmetrical (teams.into_iter().zip(initial_scores).collect())

Into_iter is too much complexity/distraction to introduce here, more than the type ending up being slightly different would.

The rest of your comments were excellent and I agree wholeheartedly, @eddyb! ❤️

@eddyb
Copy link
Member

eddyb commented Feb 18, 2017

To be fair, I have no experience with either Perl or Ruby and I've heard they both use "hash" to refer to unordered associative maps implemented using hashing.
And Java uses HashMap, just like us, but plenty of other languages (PHP, JS, Python and C++, to name a few) don't even have "hash" in the name of their equivalent.
As for which is more fundamental: neither take your own key types unless you derive the right traits (Hash vs Ord), and neither of those implies the other, they're orthogonal.

To reiterate: I am fine with only talking about HashMap as an example of a "map", since it's more widely used AFAIK, but "hash map" almost makes it sound like all maps are more specialized "hash maps".
And the comment about sets wasn't meant for a full explanation in the chapter, just a mention at the start.
Frankly, I wouldn't mind keeping the current list if the reader was told they should maybe go to https://doc.rust-lang.org/stable/std/collections/ to see the full list & tradeoff guidelines we have there.

@eddyb
Copy link
Member

eddyb commented Feb 18, 2017

we aren't constrained by the expectation of performance like index access is, which is why it was taken away?

From my POV this is wrong. Also, indexing gave a single byte back IIRC (because indexing gives back a reference that the compiler places an implicit deref on, to give you an lvalue, so char couldn't work).

@eddyb
Copy link
Member

eddyb commented Feb 18, 2017

Into_iter is too much complexity/distraction to introduce here, more than the type ending up being slightly different would.

Sadly, I agree. This is where zip(teams, initial_scores).collect() would work wonderfully 😞.

@carols10cents
Copy link
Member Author

From my POV this is wrong. Also, indexing gave a single byte back IIRC (because indexing gives back a reference that the compiler places an implicit deref on, to give you an lvalue, so char couldn't work).

So do you have a better way to explain why you can slice bytes of a str with [0..9] w/e, but you're not allowed to index bytes of a str with [5]?

@eddyb
Copy link
Member

eddyb commented Feb 18, 2017

@carols10cents We check that slices fall on complete UTF-8 sequences (which is O(1) because of how UTF-8 was designed), i.e. what you get is indeed a valid substring (as far as codepoints are concerned), but byte indexing can you give a part of an UTF-8 sequence, which is much less useful in the general case.

I might be able to figure out where the actual decision was taken and the original arguments, if you want.
EDIT: Apparently it was rust-lang/rust#15085 but that doesn't link back to the actual discussion.

@carols10cents
Copy link
Member Author

To reiterate: I am fine with only talking about HashMap as an example of a "map", since it's more widely used AFAIK, but "hash map" almost makes it sound like all maps are more specialized "hash maps".

Ok, I'm fine with adding "which is an implementation of the more general data structure called a map" or something, would that resolve this comment?

And the comment about sets wasn't meant for a full explanation in the chapter, just a mention at the start.
Frankly, I wouldn't mind keeping the current list if the reader was told they should maybe go to https://doc.rust-lang.org/stable/std/collections/ to see the full list & tradeoff guidelines we have there.

We do say:

Each kind of collection has different capabilities and costs, and choosing an appropriate one for the situation you’re in is a skill you’ll develop over time. In this chapter, we’ll go over three collections which are used very often in Rust programs: [...]
There are more specialized variants of each of these data structures for particular situations, but these are the most fundamental and common.

so what if I change this to say "Each kind of collection has different capabilities and costs, and choosing an appropriate one for the situation you’re in is a skill you’ll develop over time. In this chapter, we’ll go over three collections which are used very often in Rust programs: [...]
To learn about the other kinds of collections provided by the standard library, see the documentation at https://doc.rust-lang.org/stable/std/collections/. "

I think this will also make the discussion of "fundamental" moot?

@eddyb
Copy link
Member

eddyb commented Feb 18, 2017

@carols10cents Both of those are great minimal changes, thank you!

@carols10cents
Copy link
Member Author

@carols10cents We check that slices fall on complete UTF-8 sequences (which is O(1) because of how UTF-8 was designed), i.e. what you get is indeed a valid substring (as far as codepoints are concerned), but byte indexing can you give a part of an UTF-8 sequence, which is much less useful in the general case.

I might be able to figure out where the actual decision was taken and the original arguments, if you want.

Slicing panics if you try and slice somewhere that isn't a character boundary. So why not allow indexing but panic if it's not a single byte character?

I'd love to see the original discussion, I've tried to find it before and couldn't :-/

@eddyb
Copy link
Member

eddyb commented Feb 18, 2017

Oh, original issue for removing str indexing is rust-lang/rust#12710, and there are meeting minutes.

@carols10cents
Copy link
Member Author

burntsushi says we should say:

it's not clear what the return type of string indexing should be, and it is often a bad idea to index into a string, so we dissuade you from doing it by asking you to do more work if you really need it

@mbrubeck
Copy link
Contributor

Slicing panics if you try and slice somewhere that isn't a character boundary. So why not allow indexing but panic if it's not a single byte character?

Even for indexes pointing to single-byte characters, the Output type would need to be u8 rather than char because the String can't return a reference to a 32-bit char that doesn't exist. So it’s literally impossible, given the current Index traits, to implement "character" indexing even for a subset of strings, only byte indexing (which is redundant with string.as_bytes()) or substring indexing (which is better expressed using ranges).

it's not clear what the return type of string indexing should be, and it is often a bad idea to index into a string, so we dissuade you from doing it by asking you to do more work if you really need it

I like this but would change “do more work” to “be more specific.”

@carols10cents
Copy link
Member Author

Manishearth says:

  • byte indexing is not what folks would expect
  • codepoint indexing is O(n), which is a hidden cost
  • It is good to discourage treating strings as a list of codepoints anyway.
  • Slicing is useful because slicing can sprout off of iteration
  • most string manips need iteration. If you're iterating you don't need indexing. You do need slicing, to say something like "I have read till this point and wish to stop" or "I wish to discard the part of the string I have read"

&Manishearth> carols10cents: the expected workflow is "iterate via char_indices, then slice using indices you have noted down"
carols10cents> Manishearth: so why not "get valid char_indices, then index for individual chars"
carols10cents> like nothing enforces that workflow
&Manishearth> carols10cents: because with char_indices you already have the individual chars?
carols10cents> so why enforce indexes
rkruppe> carols10cents: because you generally want substrings, not chars
rkruppe> btw it's not just char_indices, you can get valid indices for slicing from basically every text manipulation API, from .find() to regexes

&Manishearth> carols10cents: "It's rare to directly index/slice a string without iterating through it first. If iterating through it, you don't need to index; but you still need to form slices out of known char indices."

nox> carols10cents: Make the string a bathtub full of water, explain that you can't get a single drop from it, only a section of the whole volume.
1:30 PM Second sentence: Drop pun unintended.

@carols10cents
Copy link
Member Author

nox> carols10cents: Please forget about the bathtub,
nox> you can only take a slice of a baguette.
nox> Focus on that.
nox> 🥖

@steveklabnik
Copy link
Member

So, I agree with @carols10cents on basically every point in the original post; I also agree with @eddyb in cases, but made the choices I did for the reasons that @carols10cents said.

There's also a lot of other discussion here too, just wanted to finally actually mention that.

Copy link
Member

@steveklabnik steveklabnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, one extremely minor nit

@@ -142,12 +142,12 @@ element past the end of the vector to be a fatal error that should crash the
program.

When the `get` method is passed an index that is outside the array, it will
return `None` without `panic!`ing. You would use this if accessing an element
beyond the range of the vector will happen occasionally under normal
return `None` without causing a `panic!` call. You would use this if accessing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove "call" here, it feels weird to me, idk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i'm just trying to get rid of the word that's half code, half not code. what if i just change it to, all not code, "panicking"?

@carols10cents
Copy link
Member Author

@steveklabnik i made a few more changes to actually resolve the discussions, want to take a look at the last commit or do you still approve?

with a range to create a string slice containing particular bytes:
Because it's not clear what the return type of string indexing should be, and
it is often a bad idea to index into a string, Rust dissuades you from doing so
by asking you to be more specific if you really need it. The way you can me
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/me/be/

@@ -1,7 +1,7 @@

[TOC]

# Fundamental Collections
# Common Collections
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@carols10cents carols10cents merged commit dba8499 into master Feb 20, 2017
@carols10cents carols10cents deleted the ch8-tr branch February 20, 2017 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants