Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

New Python beginner lessons #132

Merged
merged 17 commits into from
Nov 22, 2013
Merged

New Python beginner lessons #132

merged 17 commits into from
Nov 22, 2013

Conversation

gvwilson
Copy link
Contributor

@gvwilson gvwilson commented Nov 6, 2013

  1. Lay out directory structure for new novice/intermediate lessons (see Construct novice shell lesson. #121-Construct intermediate SQL lesson. #130).
  2. Add notebooks for novice introduction to Python, plus utilties, data files, and one image.
  3. Modify Makefile to delete .pyc files when doing make clean.
  4. Add a NEW_MATERIAL.md file that will eventually become the new README.md.

@ahmadia
Copy link
Contributor

ahmadia commented Nov 6, 2013

+1 on using a new development branch here master so that we can land our current pull requests into gh-pages while allowing straightforward work here.

@wking
Copy link
Contributor

wking commented Nov 6, 2013

On Wed, Nov 06, 2013 at 08:11:51AM -0800, Greg Wilson wrote:

-- File Changes --


A python/novice/inflammation-01.csv (60)
A python/novice/inflammation-02.csv (60)

A python/novice/util/gen-inflammation.py (19)

You add gen-inflammation.py and inflammation-01.csv in 282934c
(Starting beginner's lessons on Python, 2013-11-03). More
inflammation-.csv files enter in 459c91a (Second beginner's lesson on
Python, 2013-11-03). I'd prefer if the auto-generated
inflammation-
.csv files were not included in the development branch
at all, and were instead generated along with other pre-Jekyll
transitions (#92) like RMarkdown → Markdown (#119) and IPyNb splitting
(#119 (comment))
during the “build a per-boot branch for downstream consumption” step).

@wking
Copy link
Contributor

wking commented Nov 6, 2013

On Wed, Nov 06, 2013 at 08:19:23AM -0800, Aron Ahmadia wrote:

+1 on using a new development branch here master

Also +1. And +1 for @gvwilson using a feature branch in his own
repository :).

On Wed, Nov 06, 2013 at 08:26:21AM -0800, W. Trevor King wrote:

transitions (#92) like RMarkdown → Markdown (#119) and IPyNb splitting

Oops, I flipped my references. Should be #119 and then #92.

@wking
Copy link
Contributor

wking commented Nov 6, 2013

On Wed, Nov 06, 2013 at 08:29:52AM -0800, W. Trevor King wrote:

On Wed, Nov 06, 2013 at 08:19:23AM -0800, Aron Ahmadia wrote:

+1 on using a new development branch here master

Also +1. And +1 for @gvwilson using a feature branch in his own
repository :).

Although the branching-off point for bc/master from bc/gh-pages seems
somewhat arbitrary. I'd suggest either orphan branch or the current
bc/gh-pages tip for these stand-alone new-format lessons.

@gvwilson
Copy link
Contributor Author

gvwilson commented Nov 6, 2013

I've checked in the files because they're cat'd in notebooks, and I want
to ensure consistency at the point of checkout. I also tried to sync
addition of the .csv with commit of the corresponding notebook.

@ahmadia
Copy link
Contributor

ahmadia commented Nov 6, 2013

Although the branching-off point for bc/master from bc/gh-pages seems
somewhat arbitrary.

@wking I think the plan is to back-tag that commit as a pre-release, land the current PRs, then move development over to this branch. It happens to be the place where @gvwilson started working on the new reorganization from the (then-tip) of bc/gh-pages.

@ahmadia
Copy link
Contributor

ahmadia commented Nov 6, 2013

I don't have strong opinions on the generated CSV files, since they are so tiny. I think they do fall under the we should eventually generate these instead of committing them category, but we don't have that flow properly set up, so I'm +1 on leaving them as-is for now.

@wking
Copy link
Contributor

wking commented Nov 6, 2013

On Wed, Nov 06, 2013 at 08:35:05AM -0800, Greg Wilson wrote:

I've checked in the files because they're cat'd in notebooks, and I
want to ensure consistency at the point of checkout.

Consistency as in “identical ‘random’ data” should be possible by
setting the seed explicitly in a Makefile rule building the files.

Consistency as in “ready for Jekyll and per-boot-camp branches” is not
possible as pointed out by #119 and #92.

For previewing the content in this PR, I understand that you want the
the auto-generated CSV files around, but I don't think they belong in
the the final development branch.

I also tried to sync addition of the .csv with commit of the
corresponding notebook.

If you're generating them with Makefile rules, you can put the new
rules with the commit of the corresponding notebook. For example,
459c91a (Second beginner's lesson on Python, 2013-11-03) could add
something like (untested):

PYTHON = python2.7
-IMMUNIZATION_DATA_INDEXES = 01
+IMMUNIZATION_DATA_INDEXES = $(shell seq 12)
IMMUNIZATION_DATA = $(patsubst %,python/novice/inflammation-%.csv,$(IMMUNIZATION_DATA_INDEXES))

python/novice/inflammation-%.csv: python/novice/util/gen-inflammation.py
$(PYTHON) "$<" > "$@"

@wking
Copy link
Contributor

wking commented Nov 6, 2013

On Wed, Nov 06, 2013 at 08:50:04AM -0800, Aron Ahmadia wrote:

I don't have strong opinions on the generated CSV files, since they
are so tiny. I think they do fall under the we should eventually
generate these instead of committing them
category, but we don't
have that flow properly set up, so I'm +1 on leaving them as-is for
now.

Ok, I'm just trying to:

  1. set a good precedent for future auto-generated content commits, and
  2. get us to address this before adopting the new restructuring, to
    avoid another restructuring after we do decide to tackle Find and/or build tools to help manage lesson material. #119.

@ahmadia
Copy link
Contributor

ahmadia commented Nov 6, 2013

@wking - I think another restructuring after our current round of restructuring appears to be inevitable :)

Your points are absolutely valid, and I really appreciate your close eye on what's entering the repository, because even a 50 KB generated file would be a bad idea in this context.

I'd love to have a flow in place that includes a content generation stage, but I don't think we're going to be able to really seriously discuss that until January. Until then, I propose we disallow any big generated content into the repositories, and work with the R and IPython Notebook files in an effort to get those ready for generating as well.

I agree that it's a compromise, but as @gvwilson says, let's focus on getting the content in first, and we can defer cleaning up while we're still sorting out our development strategy.

@gvwilson
Copy link
Contributor Author

gvwilson commented Nov 6, 2013

Comments on python/novice/01-numpy.ipynb sent by @jdblischak by email before this PR landed:

  • NumPy is automatically loaded by Canopy
  • Like the explicit explanation of dot notation
  • typos: numpy.loaded --> numpy.loadtxt
  • I like that you always use print. At our past boot camp, the students were really confused by the fact that the last line of a cell would print automatically (the question was asked multiple times).
  • hyperlinks for functions do not work
  • You don't explain that ":" includes everything when slicing. You should introduce that concept before using it to extract data. Show them that data[:4, 10:] is the same as data[0:4, 10:40]
  • Should explicitly state that axis=0 corresponds to columns and axis=1 corresponds to rows. And maybe give some intuition for this. I would have thought 0 would be rows and 1 columns since the shape of an array is always listed rows and then columns. I miss R already...
  • %matplotlib inline does not work on Windows: "ERROR: Line magic function %matplotlib not found". But the figures still appear inline.
  • "Why do all of our plots stop just short of the upper end of our graph?" Don't know. Are you expecting students to search the internet or am I missing something obvious?
  • If you refer to a line number in a code cell, you should tell them how to show line numbers (Ctrl-m l)

@gvwilson
Copy link
Contributor Author

gvwilson commented Nov 6, 2013

Comments on python/novice/02-func.ipynb by @jdblischak sent by email before this PR landed:

I did not like the normalize function example for multiple reasons:

  1. To test the function requires that the students remember the numpy.arange function or at minimum remember that they had learned it before and look it up in the last lesson. In my frustrating experience, I have had to help many students that just stare at the screen even though the code they need to get started could easily be copy-pasted from the current or previous lesson. I'd suggest reminding them how to create a series of integers so that they can focus on the new task.

  2. Completing this task not only requires utilizing the new Python syntax that they just learned, but also some mathematical reasoning. Students that show up to a beginner's programming workshop as graduate students or postdocs are unlikely to be confident in their math skills. I see this exercise getting bogged down more by the math than the programming. It is similar to using the modulo to find out if a number is even or odd. Even when we tried explicitly explaining that the modulo returns a remainder and thus an even number will have remainder zero, there was still a significant number of students that could not get the right answer.

  3. While the first exercise with the normalize function did not take me long to complete, I can't say the same for the second challenge. It took me a few minutes to figure out the math, so I can only imagine how long this would take for our students to complete. I'd prefer that we not use precious boot camp time testing the student's math comprehension skills.

  4. I don't like the name normalize because that term is overloaded. In traditional statistics, it refers to transforming data to a normal distribution. In genomics and other fields, it can be used to refer to transformation of data to any other distribution. It can also refer to rescaling, which is what your example is. Since these lessons will be used for various audiences, how about using the name rescale instead?

@gvwilson
Copy link
Contributor Author

gvwilson commented Nov 6, 2013

Comment on python/novice/03-loop.ipynb sent by @jdblischak before this PR landed:

How did you imagine the students solving the function to reverse a string? I came up with two solutions, but I think both are somewhat advanced. This is the first for loop they are going to have ever written. My first requires them to remember how to index from the back of a list, initiate a string and an integer variable, and to update both of those variables during each iteration of the loop. The second one I doubt the students would ever come up with since the lesson is about loops and you only briefly introduced specifying a step in a slice in the first lesson. Perhaps you could have an exercise before this one that is super simple. One where they can struggle to remember to put a colon and indent the body of the loop. Then once they have gained some confidence and familiarity with the for loop, they could move on to this exercise.

def rev(s):
    x = -1
    new_s = ''
    for character in s:
        new_s = new_s + s[x]
        x = x - 1
    print new_s

def rev(s):
    print s[-1::-1]

And I am stumped on the second one. To solve this problem I would either use the range function in conjunction with a for loop or use a while loop. Since you have not introduced the range function or while loops, how did you envision them solving this? My solutions are below:

def expo(x, n):
    answer = 1
    for i in range(n):
        answer = answer * x
    return answer

def expo(x, n):
    answer = 1
    counter = 0
    while counter < n:
        answer = answer * x
        counter = counter + 1
    return answer

@gvwilson
Copy link
Contributor Author

gvwilson commented Nov 6, 2013

Comment sent by @wking before this PR landed:

genfromtxt is much nicer than loadtxt. My favorite genfromtxt feature is it's ability to read column names from a header line, which means you can avoid problems due to column ordering inconsistencies between the generator and consumer. When you don't need its fanciness, genfromtxt is basically a drop in loadtxt replacement, so I'd recommend it for
starting students off

@gvwilson
Copy link
Contributor Author

gvwilson commented Nov 6, 2013

Comment by @wking sent before this PR landed:

There's some indentation trouble around your second challenge, where you also introduce tupple assignment:

first, second = 'Grace', 'Hopper'

without having covered it in the text. Maybe the goal of the challenge is to have them try that for themselves, and you step in and explain it afterward?

@gvwilson
Copy link
Contributor Author

gvwilson commented Nov 6, 2013

Comment sent by @jiffyclub before this PR landed:

The first challenge set in 01-numpy.ipynb seems utterly unrelated to the preceding material.

@wking
Copy link
Contributor

wking commented Nov 6, 2013

On Wed, Nov 06, 2013 at 08:11:51AM -0800, Greg Wilson wrote:

  1. Lay out directory structure for new novice/intermediate lessons (see Construct novice shell lesson. #121-Construct intermediate SQL lesson. #130).
  2. Add notebooks for novice introduction to Python, plus utilties, data files, and one image.

Are we floating this a an example to decide how the new restructured
content will work (#118, #119, #120), or are we assuming that the
existing Python content (6e7b321, #24, #27, #28, #30, #43, #57, #60,
#62, #77, #85, #86, #104, +swcarpentry/boot-camps and
swcarpentry/website PRs) is not cut out for the new beginner lessons
(#123) and that we want to start over from scratch? I think detailed
comments about the content of this branch distracts from the former
goal, but maybe we've already put the nail in the coffin of our
existing IPyNb content for the novice-Python lessons?

@gvwilson
Copy link
Contributor Author

gvwilson commented Nov 6, 2013

I hope most of the existing content under lessons can be recycled for
intermediates (though that's @ethanwhite's call). Our existing material
is clearly not suitable for complete beginners (cc @jdblischak and
others); this stuff has been field-tested, and seems to work much better
for people who've never programmed.

@wking
Copy link
Contributor

wking commented Nov 6, 2013

On Wed, Nov 06, 2013 at 10:52:00AM -0800, Greg Wilson wrote:

I hope most of the existing content under lessons can be recycled for
intermediates (though that's @ethanwhite's call). Our existing material
is clearly not suitable for complete beginners (cc @jdblischak and
others);

Agreed, just making sure we were all on the same page.

this stuff has been field-tested, and seems to work much better for
people who've never programmed.

This stuff as in “PR #132”? And “field-tested” in which boot camps?
I don't see “inflammation” in any pre-#118 commits for the boot camps
I have tagged [1](pointers to missing repositories welcome). I
certainly think #132 reads better for novice programmers than our
existing stuff. I'd just like to have a better feeling for where this
stuff came from and what the earlier trials looked like. Feedback
from previous field-testing would help resolve questions like
@jdblischak's confusion over student solutions to the
string-manipulation exercises 2.

@gvwilson
Copy link
Contributor Author

gvwilson commented Nov 6, 2013

And “field-tested” in which boot camps?
Most recently Greenwich (worked very well); before that, here in Toronto.

@wking
Copy link
Contributor

wking commented Nov 6, 2013

On Wed, Nov 06, 2013 at 11:47:03AM -0800, Greg Wilson wrote:

And “field-tested” in which boot camps?
Most recently Greenwich (worked very well); before that, here in Toronto.

Thanks. I've tagged 2013-10-greenwich
(https://github.com/swcarpentry/2013-10-24-greenwich) but I'm having
trouble finding the Toronto repository. It looks like Greenwich has
the sample inflammation data, but you live-coded the notebooks without
instructor notes?

@gvwilson
Copy link
Contributor Author

On 2013-11-10 9:19 PM, Aron Ahmadia wrote:

@gvwilson https://github.com/gvwilson - Does tomorrow still count as
weekend? I don't think I'm going to be able to get to this one tonight :(

No worries - I distracted you with my branching mistakes.

@DamienIrving
Copy link
Contributor

Just a couple of comments on the testing content in 05-qa.ipynb.

  • In the "limits to testing" section, is_all_bases seems like an odd choice for a function that is supposed to check whether a character string contains only the letters A, C, G, and T. Would check_ACGT be a better choice?
  • At the beginning of the unit testing section, you explain what makes a good unit testing tool (must be easy to add or change tests, understand the previous tests, etc). Since the audience are unlikely to ever have to design their own unit testing library like unittest or nose, I'm wondering whether this discussion is relevant? It might be better to simply remove it and begin the unit testing section with the following paragraph ("The simplest kind of test...")

"and most importantly,\n",
"functions.\n",
"What they haven't done is show us how to tell if a program is getting the right answer.\n",
"If each line we right has a 99% chance of being right,\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 99% an empirically derived estimate, or is this simply a thought experiment to justify testing?

@jdblischak
Copy link
Contributor

I really like the last lesson on taking command line arguments! I wish I had come across a similar lesson when I was first learning Python. I think novices will really benefit from this material.

@ethanwhite
Copy link
Contributor

I really like the last lesson on taking command line arguments! I wish I had come across a similar lesson when I was first learning Python. I think novices will really benefit from this material.

I really like the command line lesson as well, but it's actually material that I think of as being more intermediate. @gvwilson - you've gotten all the way through the command line material with complete beginners?

@gvwilson
Copy link
Contributor Author

On 2013-11-14 3:52 PM, Ethan White wrote:

I really like the command line lesson as well, but it's actually
material that I think of as being more intermediate. @gvwilson
https://github.com/gvwilson - you've gotten all the way through the
command line material with complete beginners?
About one time in three, and only after they had seen the shell. I
would drop something else from Python in order to include this if
necessary: many people have said it's really important to show them that
Python isn't just a notebook thing.

@ahmadia
Copy link
Contributor

ahmadia commented Nov 14, 2013

many people have said it's really important to show them that Python isn't just a notebook thing.

+1

@wking wking mentioned this pull request Nov 15, 2013
@ahmadia
Copy link
Contributor

ahmadia commented Nov 21, 2013

@gvwilson - This PR has gotten too big for me to casually review. I'd suggest you delete behind you the Python lesson material you've used, and merge this when you're ready.

gvwilson pushed a commit that referenced this pull request Nov 22, 2013
@gvwilson gvwilson merged commit 42b9b82 into swcarpentry:master Nov 22, 2013
@gvwilson gvwilson deleted the new-python-beginner-lessons branch November 26, 2013 16:24
@dmj111 dmj111 mentioned this pull request Dec 10, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants