Predicting 30-day ED representation using Machine Learning

Dataset: MIMIC-IV

Process diagnosis.csv file

Run the diagnosis_processing.py script to convert all diagnosis coded to ICD-10 (the raw data include a mixture of both ICD-9 and ICD-10), and classify all diagnoses into categories.

python3 diagnosis_processing.py

Generate the ED dataset using mainly the files from MIMIC-IV-ED

python3 ED_preprocessing.py

Discretise and normalise the dataset

python3 discretise_normalised.py

fully_processed_ED.csv is the result dataset.

Generate full features train and test set

This split fully_processed_ED.csv into train-test set (80-20).

python3 get_train_test.py

Perform feature selections on the train set

CFS: "separation_mode", "n_ed_visits"
Information Gain: "separation_mode", "n_ed_visits", "diagnosis_category", "n_ed_admissions", "triage_category", "revisited"
Manual Feature Selection: "gender", "separation_mode", "diagnosis_category", "age", "triage_category"

CFS and Information Gain can be performed using Weka, but it can also be done via code. After having the list of selected features, run the code below. <FS method>_col_list arrays in this file script are the lists of selected features for each FS method.

python3 make_FS_set.py

Generate the balanced train sets using SMOTE-NC or SMOTE-N

SMOTE-NC for datasets containing both nominal and continuous features SMOTE-N for datasets containing only nominal features

python3 apply_smotenc.py
python3 apply_smoten.py

Convert CSV files to ARFF files

There would be a comparability issue between the train sets and the test sets when evaluating using Weka if we kept the file in CSV format. So we converted all files to ARFF format.

python3 csv_to_arff.py <config file>

The config files can be found in ARFF_converter.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ARFF_converter		ARFF_converter
CFS.log		CFS.log
ED_preprocessing.py		ED_preprocessing.py
InfoGain.log		InfoGain.log
README.md		README.md
csv_to_arff.py		csv_to_arff.py
diagnosis_processing.py		diagnosis_processing.py
discretise_normalise.py		discretise_normalise.py
exclusion.py		exclusion.py
fully_processed_ED summary.txt		fully_processed_ED summary.txt
get_summary.py		get_summary.py
get_train_test.py		get_train_test.py
get_train_test_script.pbs		get_train_test_script.pbs
get_train_test_smote.py		get_train_test_smote.py
gitUpdate.sh		gitUpdate.sh
helpers.py		helpers.py
make_FS_set.py		make_FS_set.py
sample_script.pbs		sample_script.pbs
testing.py		testing.py
weka_evaluate.py		weka_evaluate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting 30-day ED representation using Machine Learning

Dataset: MIMIC-IV

Process diagnosis.csv file

Generate the ED dataset using mainly the files from MIMIC-IV-ED

Discretise and normalise the dataset

Generate full features train and test set

Perform feature selections on the train set

Generate the balanced train sets using SMOTE-NC or SMOTE-N

Convert CSV files to ARFF files

About

Releases

Packages

Languages

TinaNgo/MIMIC_72-hour-representation

Folders and files

Latest commit

History

Repository files navigation

Predicting 30-day ED representation using Machine Learning

Dataset: MIMIC-IV

Process diagnosis.csv file

Generate the ED dataset using mainly the files from MIMIC-IV-ED

Discretise and normalise the dataset

Generate full features train and test set

Perform feature selections on the train set

Generate the balanced train sets using SMOTE-NC or SMOTE-N

Convert CSV files to ARFF files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages