The UBC Scientific Software Seminar is inspired by Software Carpentry and its goal is to help students, graduates, fellows and faculty at UBC develop software skills for science.
- What are the learning goals?
- To learn how to use scikit-learn to solve machine learning problems
- To master Python programming for scientific computing
- To learn mathematics and statistics applied to data science and machine learning
- To meet and collaborate with other students and faculty interested in scientific computing
- What software tools are we going to use?
- scikit-learn: machine learning in Python
- SciPy Stack: scientific computing with NumPy, SciPy, matplotlib and pandas
- Python
- Jupyter Notebooks: execute code with accompanying text, markdown and LaTeX all in the browser
- Git/GitHub: manage projects locally from the command line with Git and collaborate online with GitHub
- What scientific topics will we study?
- Machine learning fundamentals (following tutorials provided by scikit-learn.org):
- Regression, classification, clustering, dimensionality reduction
- Special topics:
- Machine learning fundamentals (following tutorials provided by scikit-learn.org):
- Where do we start? What are the prerequisites?
- UBCS3 Fall 2016 is a continuation of UBCS3 Summer 2016 which included:
- Bash shell
- Git/GitHub
- Python programming
- SciPy stack: NumPy, Scipy, matplotlib and pandas
- Basic examples using scikit-learn
- Calculus, linear algebra, probability and statistics
- UBCS3 Fall 2016 is a continuation of UBCS3 Summer 2016 which included:
- Who is the target audience?
- Everyone is invited!
- If the outline above is at your level, perfect! Get ready to write a lot of code!
- If the outline above seems too intimidating, come anyway! You'll learn things just by being exposed to new tools and ideas, and meeting new people!
- If you have experience with all the topics outlined above, come anyway! You'll become more of an expert by participating as a helper/instructor!
Fall 2016 will consist of weekly 1-hour meetings held from October until mid-December. The regular scheduled time is Friday 1-2pm (with additional hour 3-4pm for those who cannot attend 1-2pm).
- Week 1 - Friday October 7 - 1-2pm - LSK 121 [Notes]
- Overview of machine learning problems
- Exploring the scikit-learn documentation
- Getting to know the scikit-learn API
- First examples with builtin example datasets
- Week 2 - Friday October 14 - 1-2pm - LSK 121 [Notes]
- Regression Example: Diabetes dataset
- A closer look at least squares linear regression calculations
- Can we improve R2? Let's create more features
- Splitting the dataset: Training data and testing data
- Classification Example: Hand-written digits dataset
- K-nearest neighbors classifier
- Evaluating the model
- Regression Example: Diabetes dataset
- Week 3 - Friday October 21 - 1-2pm - LSK 121 [Notes]
- Dimensionality reduction
- Principal component analysis
- Visualizing the digits dataset
- Linear algebra behind principal component analysis
- Week 4 - Friday October 28 - 1-2pm - LSK 121 [Notes]
- PCA revisted
- Visualizing principal components
- Unsupervised learning
- Clustering with K-means
- Digits dataset: How many different kinds of 1s are there?
- Combining KMeans with PCA
- PCA revisted
- Week 5 - Friday November 4 - 1-2pm - LSK 121 [Notes]
- Kernel density estimation and Gaussian processes - Presented by @sempwn
- Remembrance Day - No meeting November 11
- Week 6 - Friday November 18 - 1-2pm - UCLL 109
- Natural Language Processing with nltk: Movie Review Classification - Presented by @dbhaskar92
- Week 7 - Friday November 25 - 1-2pm - UCLL 109 [Notes]
- Natural Language Processing with nltk: Movie Review Classification (Continued)
- Working with nltk movie review dataset
- Using regular expressions to remove punctuation and stopwords
- Creating feature vectors from movie reviews
- Applying a Naive Bayes classifier
- Natural Language Processing with nltk: Movie Review Classification (Continued)