These are the slides and notebooks used during the meetup "Python for Data Scientists". The event took place at the European Data Innovation Hub in Brussels on Thu 17 Sep 2015.
Most of the material here is either directly from or closely adapted from other sources. In particular, the overview closely follows the chapter 1 of "Python: Essential Reference" (4th edition), by David Beazley and the Scikit.learn and Pandas notebooks owe a lot to Jake Vanderplas' tutorial notebooks on GitHub.
In the past few years, Python has emerged as a solid platform for data
science. Couple a mature, clean and expressive language with powerful,
fully-featured libraries for data wrangling and machine learning, and
you're set up for maximum productivity. Easily ingest your data from
practically anywhere using one of Python's thousands of free
libraries. Effortlessly turn hundreds of convoluted lines of obscure
model code into just a few lines of near-English prose. Add a few
annotations and get maximum performance without drowning in pools of
unnecessary boilerplate code. Present your results in beautiful living
notebooks that seamlessly mix text, code and graphs. Whether you do
all your modeling in R
, you've written nothing but Matlab since
university, or you swear by C#
or (gasp!) Java, discovering Python
will be a wonderful experience.
In detail, we plan to cover the following points:
-
Quick history of Python and typical use cases
-
Key advantages and disadvantages of Python for data science
-
Ways to run python and write code
-
Quick tour of language
-
Showcase of useful language packages for data science:
- NumPy
- SciPy
- Matplotlib
- Pandas
- Scikit-learn
- PySpark [omitted due to time constraints]
- PyHive [omitted due to time constratins]
- Accessing RDBMSs [omitted due to time constraints]
- Writing efficient Python:
- Cython
- Numba
- SWIG [omitted due to time constraints]
- Pointers for further learning: follow links in the notebooks