Let's start doing our data analysis not in a spreadsheet program and learn Python and Pandas along the way.
Don't get me wrong, I use spreadsheets, but not for data analysis.
Also, there are some notes from people who I've talked to during the conference in the notes
folder.
Click the .md
file, and github will render the document on the website (like this README.md
file you are reading now).
Material for Pandas Tutorial at Pydata Carolinas 2016
PyData Carolinas 2016
September 14-16, 2016
Hosted by IBM Emerging Technologies
Research Triangle Park, NC
IBM RTP Activity Center 3039 East Cornwallis Road, Building 400 Research Triangle, NC 27709
http://pydata.org/carolinas2016/schedule/
- Pandas DataFrame basics
- Data assembly
- Missing Data
- Plotting
The easiest way to get everything you need to the tutorial is to install anaconda
You can download and install it here: https://www.continuum.io/downloads
I will be using the Python 3 version during the tutorial.
I actually ended up using Python 2 because of I had a last minute computer change
conda install seaborn
- Gapminder: https://github.com/jennybc/gapminder/raw/master/inst/gapminder.tsv
- Survey: Comes from the Software-Carpentry SQL lesson
- Ebola: www.github.com/cmrivers/ebola