Run code and results pairing with our paper titled "A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary Classification: Application in Pancreatic Cancer Nested Case-control Studies with Implications for Bias Assessments". Includes analysis and comparison of three pancreatic cancer datasets as well as a simulated GAMETES dataset.
The zipped folders contain the dataset specific notebook code, and html analysis snapshot, as well as output figures and files for each analysis.
Original datasets and CV datasets generated by the notebook have been excluded here. Individual LCS output .txt files have also been excluded for space reasons, however these are available upon request.
The python code common to each of the three dataset analyses is contained in common_ml_pipe_code.
Code and results for comparing the analysis of the three dataset analyses are included in ml_compare_datasetes.
This repository is primarily meant to share the uniquely formated code for each dataset analysis for reproducibility as well as the results files from this analysis.
If you wish to run this binary classification ml notebook, we redirect users to our other repository https://github.com/UrbsLab/ExSTraCS_ML_Pipeline_Binary_Notebook. This offers the same pipeline used in this pancreatic cancer study, for application to any binary classification dataset.