Skip to content

Review notebook 3 "Subsetting data" #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: review
Choose a base branch
from

Conversation

jorisvandenbossche
Copy link
Collaborator

No description provided.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@@ -0,0 +1,1358 @@
{
Copy link
Collaborator Author

@jorisvandenbossche jorisvandenbossche Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "assign new value to selection" is not done? (which might be good, there is already a lot here)


Reply via ReviewNB

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a short example to the tutorial to cover this as well.

@@ -0,0 +1,1358 @@
{
Copy link
Collaborator Author

@jorisvandenbossche jorisvandenbossche Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe actually show this in code in two steps? (after this explanation)


Reply via ReviewNB

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I added this for the first example .

@@ -0,0 +1,1358 @@
{
Copy link
Collaborator Author

@jorisvandenbossche jorisvandenbossche Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, after the initial example and this explanation, I would show that the conditional statement gives a boolean series, which is then used to filter the dataframe?


Reply via ReviewNB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jorisvandenbossche. This is very confusing for beginners, and so it would be helpful to show the mechanics as he described.

@@ -0,0 +1,1358 @@
{
Copy link
Collaborator Author

@jorisvandenbossche jorisvandenbossche Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notnull -> notna


Reply via ReviewNB

@@ -0,0 +1,1358 @@
{
Copy link
Collaborator Author

@jorisvandenbossche jorisvandenbossche Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should give a bit more guidance on why or when to use this (of course can't go in detail, that is for the user guide), but now both the "column names" and "condition expression" have already been used above without using loc. So why use loc here?

The main difference is maybe that here you are selecting rows and columns in one go?


Reply via ReviewNB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jorisvandenbossche. At minimum, I would explain that the part before the comma is the rows you want, and the part after the comma is the columns you want.

I might also mention the options for what you can pass it: a single label, a list of labels, a slice of labels, a conditional expression, or a colon.

@@ -0,0 +1,1358 @@
{
Copy link

@justmarkham justmarkham Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a space between the "I" and the apostrophe. (This same typo is a few other places in this lesson.)


Reply via ReviewNB

@@ -0,0 +1,1358 @@
{
Copy link

@justmarkham justmarkham Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth mentioning explicitly that each condition must be surrounded by parentheses, and you have to use | and & (you can't use or or and).


Reply via ReviewNB

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a typical problem. Added as a note to the tutorial

@@ -0,0 +1,1358 @@
{
Copy link

@justmarkham justmarkham Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between a "conditional expression" and a "conditional statement"?


Reply via ReviewNB

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to define the difference between an conditional expression (>,...) and a conditional function (isin, notna,...). This is not very clear and provides no added value. I changed to your suggestion earlier: 'single column label, a list of column labels, a slice of labels, a conditional expression, or a colon.'

@@ -0,0 +1,1358 @@
{
Copy link

@justmarkham justmarkham Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great place to introduce the shape attribute. You can show here that a Series is 1-dimensional, and you can contrast that below with the DataFrame being 2-dimensional. Introducing shape here will also help later in this lesson.


Reply via ReviewNB

@@ -0,0 +1,1358 @@
{
Copy link

@justmarkham justmarkham Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that you either choose a different number (perhaps Age > 35) or flip the sign (Age < 18). As is, the filter Age > 18 doesn't change the head of the DataFrame, and thus the reader can't see what has changed.

Also, if you introduce the shape attribute (above), you can also show that the shape has changed after the filter.


Reply via ReviewNB

@@ -0,0 +1,1358 @@
{
Copy link

@justmarkham justmarkham Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you introduce the shape attribute (above), you can add something here about how the shape has changed, even though the head hasn't changed.


Reply via ReviewNB

@stijnvanhoey
Copy link
Owner

@justmarkham and @jorisvandenbossche thanks a lot for the valuable comments and review. It is not easy to keep this notebook short while providing sufficient. Feel free to add further comments.

@@ -0,0 +1,1693 @@
{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would somehow try to summarize here also the cases when to using plain [] (which is: select one or more columns, filtering of the rows)


Reply via ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants