Skip to content

JoeJoe1313/MSc-Machine-Learning-and-Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MSc-Machine-Learning-and-Data-Science

Year 1 (2022-2023)

  • Applicable Maths
    • A Review of Calculus, Linear Algebra, Matrix Decomposition, Introduction to Probability Theory, Random Variables & Probability Distributions, Concentration Bounds & Convergence, Stochastic Processes, Sample-Based Statistics, Optimisation, Distance Functions
  • Programming for Data Science (R, Python)
    • Basic R and Python, namespaces, vectors and matrices, random number generation. File formats, data frames and APIs. Graphs and Time series. Objected-oriented programming. Version control with Git. Debugging, profiling and defensive programming. Unit testing. Packaging code for others. Generators. Awareness of challenges in software development for data science.
  • Ethics in Data Science and Artifical Intelligence (Part 1 and Part 2) (Python, R)
    • Part 1 begins by discussing the ethical use of data itself - the raw materials of data science pipelines. It then discusses sets of principles that tech leaders and international bodies are adopting to promote ethical use of data science and artificial intelligence algorithms, including a discussion of real-world examples of failings and adverse outcomes.
    • Part 2 revisits the issues explored in Part 1 in greater technical detail. This part introduces data science methodologies that provide novel solutions to ethical problems of old such as explainability, prejudice and bias.
    • Part 3 (Year 2)
  • Exploratory Data Analytics and Visualisation (R, tidyverse, ggplot2)
  • Supervised Learning (R)
  • Big Data: Statistical Scalability with PySpark (Python, Hadoop, MapReduce, PySpark)
    • The module consists of three components: statistical analysis at scale, distributed programming using MapReduce, Big Data analysis using PySpark. The first component covers theory on statistical scalability, and discusses topics such as stochastic and distributed optimisation, stochastic variational inference Markov Chain Monte Carlo methods for tall data, and statistical analysis of streaming data. The second and third components cover practical aspects of handling Big Data, introducing two frameworks for analysis of large datasets: Hadoop and Spark.
  • Bayesian Methods and Computation (Python, PyStan)

Year 2 (2023-2024)

  • Unsupervised Learning (R, Python)
    • The module introduces tools for solving different unsupervised learning challenges. The lectures focus on techniques for dimensionality reduction, density estimation and clustering. Anomaly and outlier detection algorithms will be discussed and developed. More specific contents include:
      1. Dimensionality reduction: principalcomponentanalysis(PCA),extensions of PCA, non-negative matrix factorisation, independent component anal- ysis, non-linear dimensionality reduction methods.
      2. Density estimation: parametric density estimation, estimation of mixtures using EM algorithm, non-parametric density estimation (kernel and his- togram).
      3. Clustering: k-means clustering, k-medoid clustering, hierarchical cluster- ing, evaluation measures.
      4. Anomaly and outlier detection: clustering-based methods, density-based methods.
  • Unstructured Data Analysis (Python, PyTorch, scikit-image)
    • Images: Basics of images; Convolutional neural networks for image classification; Edge detection; Image denoising; Image segmentation
    • Networks: Basics of networks; Graph embedding; Graph algorithms; Community detection; Graph kernels
    • Text: Basics of text data; Text vectorization; Natural language processing; Text clustering
  • Learning Agents (Python)
  • Deep Learning (Python, TensorFlow)
  • Research Project
    • Main topic area: Bayesian Computation for Matrix Factorisation
    • Thesis: Probabilistic Sequential Matrix Factorisation for 12-Lead ECG Data