- Course outline on GBC website (older outline draft as md file)
- Slack signup page - https://join.slack.com/t/georgebrowntech/signup
Python Data Science Handbook: Essential Tools for Working with Data (by Jake VanderPlas)
- Read online
- GitHub repository: https://github.com/jakevdp/PythonDataScienceHandbook
- Anaconda download page: https://www.anaconda.com/download/ (Use the Python 3.6 version)
- Basic tutorial about git and GitHub: video (skip it if you used git before)
This course assumes reasonable knowledge of Python, if you haven't used Python before, consider one of the following resources:
- Codecademy's Python course - browser-based, tons of exercises
- DataQuest - browser-based, teaches Python in the context of data science
- CheckIO - a good collection of exercises to try when you are comfortable with the basics
- Lab notebook - Sept 4th
- Lecture notebook - Sept 6th
- Recommended Video: Making data mean more through storytelling by Ben Wellington at TEDxBroadway (15 min)
- Lab - Sept 11
- Lab exercise - exercises/numpy_basics.ipynb
- Slides
- Lecture - Sept 13
- Slides (not too many of them, most of the lecture is live / whiteboard)
- Whiteboard screenshot (png image, for full resolution click "Download" above the image)
- IPython transcript (What the instrucor was typing - no output. Not usefully runnable as a single Python file)
- Links
- NumPy
- Quick reference: https://github.com/juliangaal/python-cheat-sheet
- Videos: video 1, video 2 - from the Python Programmer YouTube channel
- Matplotlib
- A playlist of several good short videos introducing different types of plots with matplotlib This link skips the first video in the playlist because it talks about installation and you already have matplotlib installed via Anaconda.
- Basic plotting tutorial on matplotlib website
- Facts and Myths about Python names and values - by Ned Batchelder at PyCon 2015 (video, 25min)
- NumPy
- Lab - Sept 18
- Slides
- Lab exercise - exercises/plotting_basics.ipynb (it uses exercises/OshawaWeather2016.csv)
- Lecture - Sept 20
- Home reading & videos (important!)
- Video playlist about Pandas (watch the first 10 videos - about 1 hour 33 min total)
- Notebook & data used in the videos (GitHub) - the notebook has sufficient comments to be very useful without videos as well
- Home assignment - exercises/numpy_assignment.py (instructions inside)
- Lab - Sept 25
- Slides
- Lab exercise - exercises/olympic_history.ipynb
- Lecture - Sept 27
- More videos about Pandas
- Pandas best practices - video playlist
- Lab - Oct 2
- Lecture - Oct 4
- Home Assignment 1 is due Tuesday Oct 9, 23:59
- Lab - Oct 9
- Slides
- Exercise: exercises/week6_lab.ipynb
- Lecture - Oct 11
- Lab - Oct 16
- Lecture - Oct 18, MID-TERM
- scikit-learn is already installed if you use Anaconda Python
- Video playlist and corresponding notebooks (highly recommended)
- Skim through this post about changes in scikit-learn since the videos were recorded
- Book chapter 5
- Lab - Oct 30
- Slides
- Exercise: exercises/mpg_regression.ipynb
- Lecture - Nov 1
- Slides (alternative link via nbviewer)
- Whiteboard screenshot
- Links
- So Why Is It Called "Regression," Anyway?
- Linear regression example - Boston housing dataset (video)
- If you are curious about the math behind linear regression (slow): Khan Academy - least squares fit
- Lab - Nov 6
- Slides
- Exercise: exercises/glass_identification.ipynb and exercises/glass.csv
- Lecture - Nov 8
- Slides (alternative link via nbviewer)
- Intro to NLP - guest lecture by Sahand Saba - notebook
- exercises/home_assignment2.py (all instructions inside)
- Lab - Nov 13
- Exercise: exercises/yelp.ipynb and exercises/yelp.csv
- Lecture - Nov 15
- Slides (alternative link via nbviewer)
- Intuitive Sensitivity & Specificity (video, 9min)
- The tradeoff between sensitivity and specificity (video, 12min)
- Lab - Nov 20
- Exercise: exercises/credit_score.ipynb - data is not in this repo.
- Lecture - Nov 22
- Slides (alternative link via nbviewer)
- Making sense of the confusion matrix (video, 35min)
- See instructions in exercises/home_assignment3.md
- Same format as the mid-term
- The exam will include material from the entire semester - do not neglect NumPy and Pandas basics
- Focus on lab exercises, exercises are always more important than reading
- Watch the videos linked from weekly sections above (or read the associated notebooks)
- We touched on all five chapters of the book by now. If the book works well for you, it's a great source to study from, but videos do cover all of the material as well.
- Lab - Nov 27
- Exercise: exercises/mpg_revisited.ipynb
- Lecture - Nov 29
- Slides (alternative link via nbviewer)
- Lab - Dec 4
- Exercise: exercises/bikes.ipynb (CSV file in exercises folder)
- Or continue with home assignment 3
- Lecture - Dec 6
- Review
- Lab - Dec 11 - works as office hours in c410
- You are allowed to bring one sheet of paper (up to Letter/A4 size) of reference you prepared yourself. Use it wisely, most people benefit from the process of preparing the page, but not so much from using it during the exam
- Otherwise same format as the mid-term
- Material from the entire semester will be covered including (but not limited to) NumPy, Pandas, plotting and sklearn
- Next semster course - https://github.com/kamrik/ML2
- Kevin's video - How do I stay up-to-date as a data scientist?
- The Map of Machine Learning - video
- If you are interested in a more theoretically midned treatment of statistical ML
- Book: The Elements of Statistical Learning
- Learning From Data a course taught by Caltech Professor Yaser Abu-Mostafa