Skip to content

tianliu9/Comp_Med_Project

 
 

Repository files navigation

Feature Selection and Classification on High-dimensional Brain Cancer Microarray Data

Currently, techniques such as microarrays can give large data about gene expression with limited samples. We choose brain cancers to study due to its low incidence which is 6.3 per 100,000 men and women per year, and use feature selection to find optimal features for multiclass classification.

Pipeline

Dataset: "Brain_GSE50161.csv"
Feature Selections:

  1. our pipeline with variance: "feature_selection_with_variance.ipynb"
    • input: "Brain_GSE50161.csv"
    • output: "df_w_var.csv"
  2. our pipeline without variance: "feature_selection_with_variance.ipynb"
    • input: "Brain_GSE50161.csv"
    • output: "df_wo_var.csv"
  3. LASSO: "feature_selection_with_lasso.ipynb"
    • input: "Brain_GSE50161.csv"
    • output: "df_lasso.csv"

Classifications:

  1. Run multiclass classification with the dataset generated by the three feature selections scripts: "Classification.ipynb"
    • input: "df_w_var.csv" or "df_wo_var.csv" or "df_lasso.csv"
    • output: accuracy, F1 score, confusion matrices
  2. Perform PCA and then run multiclass classification: "PCA.ipynb"
    • input: "Brain_GSE50161.csv"
    • output: accuracy, F1 score, confusion matrices

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%