You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refactor: Looking for implementation strategies to improve run time efficiency of all algorithms regardless of data type (i.e. discrete/continuous, missing data)
#39
Open
ryanurbs opened this issue
Apr 5, 2018
· 2 comments
One of the major challenges of making the Relief-based algorithms of ReBATE flexible enough to handle different dataset types, i.e. (1) continuous, discrete, or mixed feature types, (2) binary, multiclass, or continuous outcomes, (3) presence of missing data, is to do so in a way that preserves computational efficiency. Presently scikit-rebate is implemented in a fairly compact manner, however this may not ultimately be the most efficient implementation. This issue posting seeks enhancements to ReBATE and it's underlying algorithms (i.e. ReliefF, SURF, SURF*, MultiSURF, MultiSURF*, TuRF) to make the respective algorithms run faster, and utilize less memory.
The text was updated successfully, but these errors were encountered:
Hello I want to help in this issue, but first of all I will write some tests, that will guarantee that the code after optimization has the same result as code before the optimization.
Hi folks - I took an initial pass at this to see if I could proof of concept some changes. I also implemented a benchmarking tool so folks could see how any branch was performing.
See here for my draft PR - it's not ready quite yet as I need to rerun my performance benchmarks. It provides a pattern for one case (ReliefF, binary features, discrete data) that I believe could be generally implemented across all cases to provide clearer code and much more performant operations. I'm working on a full testing benchmark run but initial results for the current parallel ReliefF test on binary/discrete data show a runtime improvement of ~1.85 seconds down to ~.6 seconds for the small testing dataset.
One of the major challenges of making the Relief-based algorithms of ReBATE flexible enough to handle different dataset types, i.e. (1) continuous, discrete, or mixed feature types, (2) binary, multiclass, or continuous outcomes, (3) presence of missing data, is to do so in a way that preserves computational efficiency. Presently scikit-rebate is implemented in a fairly compact manner, however this may not ultimately be the most efficient implementation. This issue posting seeks enhancements to ReBATE and it's underlying algorithms (i.e. ReliefF, SURF, SURF*, MultiSURF, MultiSURF*, TuRF) to make the respective algorithms run faster, and utilize less memory.
The text was updated successfully, but these errors were encountered: