Floodlight Open is an open study by Genentech for healthy controls or patients with multiple sclerosis (MS). The goal is to be able to monitor a patient's progression over time using various tests from a smartphone app. These tests can include mood assessments, hand strength, balance, and general mobility. From a smartphone, Floodlight Open has been able to reference some of the key assessments done by neurologists on MS patients.
The aim of this project is to use the data collected from MS patients and healthy controls to see whether or not it is possible to predict multiple sclerosis from these tests administered by Floodlight Open.
85% of patients with MS are diagnosed with a disease called relapsing-remitting multiple sclerosis (RRMS). RRMS is characterized by periods of particularly bad symptoms (relapses) with periods of remission which results in slow progression.
A number of participants in the study completely multiple tests multiple times, so I calculated the mean for each participant per test. However, to preserve the nuances of the disease course of RRMS, I also calculated the variance for each test and included each individual test variance as a feature.
I also calculated age and body mass index from the time, height, and weight data given.
Due to the inherent covariance associated with many of these tests, I ran a principal component analysis (PCA) followed by logistic regression, random forest, and XGBoost. However, none of these models were very strong. I then ran random forest with tuned hyperparameters on my cleaned and engineered data. This resulted in an accuracy rate of 85.47%.
I also plotted the feature importances of this model and found that variance in mobility was the most important feature. This makes a lot of sense when considering a RRMS patient's symptoms.
Having the ability to predict whether or not someone has multiple sclerosis using these tools may be beneficial in the initial diagnosis process. Specifically for patients whose charts and medical history may point to similar diseases such as clinically isolated syndrome (CIS) or neuromyelitis optica (NMO). Further, by including a patient's disease course (RRMS, secondary progressive MS (SPMS), or primary progressive MS (PPMS)), this may be helpful in predicting which disease course a patient may exhibit.
This kind of model could also be adapted to include additional patient data such as more robust depression scores, strength tests, and vision tests. Perhaps an extended dataset and model would be helpful in predicting time-specific information for each disease course. For example, a prediction of when a patient's RRMS will progress to SPMS.