-
Notifications
You must be signed in to change notification settings - Fork 5
Review notebook 3 "Subsetting data" #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: review
Are you sure you want to change the base?
Conversation
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "assign new value to selection" is not done? (which might be good, there is already a lot here)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a short example to the tutorial to cover this as well.
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I added this for the first example .
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, after the initial example and this explanation, I would show that the conditional statement gives a boolean series, which is then used to filter the dataframe?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jorisvandenbossche. This is very confusing for beginners, and so it would be helpful to show the mechanics as he described.
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should give a bit more guidance on why or when to use this (of course can't go in detail, that is for the user guide), but now both the "column names" and "condition expression" have already been used above without using loc. So why use loc here?
The main difference is maybe that here you are selecting rows and columns in one go?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jorisvandenbossche. At minimum, I would explain that the part before the comma is the rows you want, and the part after the comma is the columns you want.
I might also mention the options for what you can pass it: a single label, a list of labels, a slice of labels, a conditional expression, or a colon.
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a space between the "I" and the apostrophe. (This same typo is a few other places in this lesson.)
Reply via ReviewNB
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth mentioning explicitly that each condition must be surrounded by parentheses, and you have to use |
and &
(you can't use or
or and
).
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed a typical problem. Added as a note to the tutorial
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between a "conditional expression" and a "conditional statement"?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to define the difference between an conditional expression (>,...) and a conditional function (isin, notna,...). This is not very clear and provides no added value. I changed to your suggestion earlier: 'single column label, a list of column labels, a slice of labels, a conditional expression, or a colon.'
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a great place to introduce the shape attribute. You can show here that a Series is 1-dimensional, and you can contrast that below with the DataFrame being 2-dimensional. Introducing shape here will also help later in this lesson.
Reply via ReviewNB
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend that you either choose a different number (perhaps Age > 35) or flip the sign (Age < 18). As is, the filter Age > 18 doesn't change the head of the DataFrame, and thus the reader can't see what has changed.
Also, if you introduce the shape attribute (above), you can also show that the shape has changed after the filter.
Reply via ReviewNB
@@ -0,0 +1,1358 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you introduce the shape attribute (above), you can add something here about how the shape has changed, even though the head hasn't changed.
Reply via ReviewNB
@justmarkham and @jorisvandenbossche thanks a lot for the valuable comments and review. It is not easy to keep this notebook short while providing sufficient. Feel free to add further comments. |
@@ -0,0 +1,1693 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would somehow try to summarize here also the cases when to using plain [] (which is: select one or more columns, filtering of the rows)
Reply via ReviewNB
No description provided.