Study of Dimensionality Reduction Techniques and Interpretation of their Coefficients, and Influence on Learned Models

Code used for my Masters Thesis "Study of Dimensionality Reduction Techniques and Interpretation of their Coefficients, and Influence on Learned Models", which can be accessed here. It obtained the maximum grade (10/10).

Running the code in main.ipynb, the user can reproduce the results shown in the thesis.

What was done

First, the dimensionality of the data was reduced using state-of-the-art dimensionality reduction techniques such as SLMVP. These reduction techniques were combined with different machine learning classifiers to fine-tune their parameters. The objective was to identify the optimal configuration that achieves the highest accuracy with the given data. The accuracy obtained with only the first k components is measured for different values of k.

Second, the performance of the techniques in capturing and preserving the structure of the original dataset is analyzed by plotting their projections in 2 and 3-dimensional plots. We look into whether the data points are evenly distributed or not, this shows how effectively the technique has managed to capture the overall variance of the dataset, and whether the graph exhibits a clear separation of the different classes. This, paired with the accuracy obtained in the previous classification task, tells us about the goodness of the technique.

Finally, the correlations between the original data and each one of the components obtained through dimensionality reduction are leveraged to extract meaningful qualitative information. This is based on the fact that the components are the directions of maximum variability of the data and it is fair to assume that the variables that have a high absolute correlation with a component are given a high significance by the dimensionality reduction technique. A recommendation is then given as to which features should be selected for a posterior machine learning task, based on their absolute correlation with the components.

In addition, the correlations are also leveraged to compare the similarity and dissimilarity of components realized by applying different techniques. This is done by calculating the spearman correlation coefficient of the absolute correlation between two components, obtaining a similarity score.

Theoretical principles

This work draws inspiration mainly from the following papers:

Esteban García-Cuesta, José Antonio Iglesias. "User modeling: Through statistical analysis and subspace learning."

Jolliffe, I. T. "Discarding Variables in a Principal Component Analysis."

File structure

main.ipynb: IPython notebook showing the pipeline and the results.
models.py: Contains packaged code to train and test the models, as well as generate graphs and tables.
slmvp.py: Dimensionality Reduction Technique SLMVP.
datasets.py: code to load and prepare data for pipeline.
requirements.txt
TFM-Doc/: LaTex documents used to create the thesis document.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
TFM-Doc		TFM-Doc
.gitignore		.gitignore
README.md		README.md
R_plots.R		R_plots.R
cloud_main.py		cloud_main.py
cloud_script.sh		cloud_script.sh
datasets.py		datasets.py
figures.ipynb		figures.ipynb
main.ipynb		main.ipynb
models.py		models.py
requirements.txt		requirements.txt
slmvp.py		slmvp.py
whitepaper.ipynb		whitepaper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Study of Dimensionality Reduction Techniques and Interpretation of their Coefficients, and Influence on Learned Models

What was done

Theoretical principles

File structure

About

Releases

Packages

Languages

miguelangel43/Dimensionality-Reduction-Masters-Thesis

Folders and files

Latest commit

History

Repository files navigation

Study of Dimensionality Reduction Techniques and Interpretation of their Coefficients, and Influence on Learned Models

What was done

Theoretical principles

File structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages