This project is a repository for Python machine learning fundamentals, containing a variety of Jupyter notebooks that cover different aspects of machine learning. The notebooks include topics such as linear and polynomial regression, autoregressive models, classification, dimensionality reduction, clustering and ensemble models.
The project covers the fundamentals of machine learning techniques, starting with linear and polynomial regression in notebook 1 and 2 respectively. Linear regression is a supervised learning algorithm used to predict a continuous outcome variable (dependent variable) based on one or more predictor variables (independent variables). Polynomial regression is an extension of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial.
In notebook 3, an autoregressive (AR) model is used. An AR model is a type of time series model that uses past values of a variable to predict its future values. This is done by fitting a linear equation to the data where the predictors consist of lagged versions of the response variable.
In notebook 4 and 5, classification techniques are introduced. Classification is a supervised learning task that involves predicting a categorical outcome variable (dependent variable) based on one or more predictor variables (independent variables).
Notebook 6 covers dimensionality reduction, which is the process of reducing the number of features (predictor variables) in a dataset while retaining as much information as possible. This can be useful for visualizing high-dimensional data and for improving the performance of machine learning algorithms.
In notebook 7, clustering is introduced, which is a type of unsupervised learning that involves grouping similar data points together. Clustering is used to explore and understand unlabelled data.
Finally, in notebook 9, ensemble models are introduced. Ensemble models are a type of machine learning technique that combine the predictions of multiple models in order to improve the overall performance of the system.
- pandas
- sklearn
- numpy
- matplotlib