Skip to content

Latest commit

 

History

History
125 lines (91 loc) · 8.83 KB

File metadata and controls

125 lines (91 loc) · 8.83 KB

LODA-Lecture Notes on Data Analysis

Lecture notes (in form of slides) and excercises in Python using ipython-notebook for teaching data and media analysis. It includes introductions to Python, Numpy, Scipy, Scikit-Learn, SimpleCV. It covers the topics Supervised/Unsupervised Learning, Signal Analysis, Image Analysis, Text and Web-Media Analysis.

Presentation

The lecture notes are optimized for presentation. In order to use them, invoke

ipython nbconvert --to=slides --post serve path-to-lecture-notes

to start the presentation (a browser window should open automatically).

you can also try to get livereveal.js up and running in your ipython environment.

Acknowledgment

This work is largely based on a number of great tutorials and resources all over the web, compiled by great people from very different domains. Without their effort and their will to make their hard work open access, i would have not been able to compile this tutorial. The individual contributions are listed in the beginning of every part.

Outline

  1. Introduction - Why, What, Who, How

    1. The point of view of Web Mining (Course Web Mining Project@University Passau)
  2. Part I: Scientific Programming in Python

    1. Introduction
    2. Programming Basics
      1. Exercise 2.1.: Python Standard Data Structures
    3. Numpy in a Nutshell
      1. [Exercise 3.1. Data Structures and Operations in Numpy] (http://nbviewer.ipython.org/urls/raw.github.com/mgrani/LODA-lecture-notes-on-data-analysis/master/I.Data-Science-in-Python/exercises/Exercise%20DSiP-3-1-Numpy.ipynb)
    4. Scipy in a Nutshell
    5. Mathplotlib in a Nutshell
      1. [Exercise DSiP-5-1-Analysing the Iris Dataset with Mathplotlib] (http://nbviewer.ipython.org/urls/raw.github.com/mgrani/LODA-lecture-notes-on-data-analysis/master/I.Data-Science-in-Python/exercises/Exercise%20DSiP-5-1-Analysing%20the%20Iris%20Dataset%20with%20Mathplotlib-.ipynb)
    6. Pandas based Data Analysis 1. [Exercise 6.1. Analysing New York Open Data with Pandas] (http://nbviewer.ipython.org/urls/raw.github.com/mgrani/LODA-lecture-notes-on-data-analysis/master/I.Data-Science-in-Python/exercises/Exercise%20DSiP-6-1-Pandas-NYC-Open-Data.ipynb)
  3. Part II: Machine Learning and Data Mining [in Python]

    1. Machine Learning in a Nutshell with scikit-learn

      1. Overview and Preprocessing
      2. Supervised Learning: Classification and Regression
      3. Unsupervised Learning: Clustering
      4. Unsupervised Learning: Projections and Manifolds
    2. Machine Learning Basics

      1. On the Data
      2. Regression
      1. Concept Learning
      2. Measuring Performance
    3. Decision Trees

      1. Decision Tree Basics
      2. Impurity Functions
      3. Decision Tree Algorithms
        1. ID 3 in Python
      4. Decision Tree Pruning
    4. Statistical Learning

      1. Probability Basics
      2. Bayes Classification
      3. Graphical Models
    5. Linear Models

    6. Kernel Models

    7. Neuronal Networks

      1. Perceptron Learning
      2. Multilayer Perceptrons
      3. Deep Learning
    8. Ensemble Classifiers

    9. Cluster Analysis

      1. Clustering with scikit-learn
    10. Dimensiontality Reduction and Manifold Learning

    11. Dimensionality Reduction and Manifold Learning with scikit-learn

    12. Association Rules

    13. Reinforcement Learning

    14. Deep Learning

    15. Single Layered Autoencoder

  4. Part III: Natural Language Processing [in Python]

    1. An Introduction to NLTK
      1. NLTK Exercise
  5. Part IV: Visual Analytics

    1. Information Visualisation with JavaScript and D3JS
  6. Part V: Social Network Analysis

  7. Part X. Web Mining Applications

    1. Crawling and Analysing Twitter
      1. Exercise: Crawling and Analysing Twitter
      2. Project: Shitstorm Detection
      3. Project: Topic Detection
      4. Project: Web Crawling - Genre Classification

Helpful Links

License

This work is licesend under a Creative-Commons 3.0 license.

Citation

DOI

Powered by zenodo (Join and contribute your own open material)