A Novel Decomposing Model with Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs
-
Robson P. Bonidia, Jaqueline Sayuri Machida, Tatianne C. Negri, Wonder A. L. Alves, André Y. Kashiwabara, Douglas S. Domingues, André C.P.L.F. de Carvalho, Alexandre R. Paschoal, Danilo S. Sanches
-
Correspondence: rpbonidia@gmail.com or bonidia@usp.br or danilosanches@utfpr.edu.br
If you use this code in a scientific publication, we would appreciate citations to the following paper:
R. P. Bonidia et al., "A Novel Decomposing Model With Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs," in IEEE Access, vol. 8, pp. 181683-181697, 2020, doi: 10.1109/ACCESS.2020.3028039.
@ARTICLE{9210051,
author={R. P. {Bonidia} and J. S. {Machida} and T. C. {Negri} and W. A. L. {Alves} and A. Y. {Kashiwabara} and D. S. {Domingues} and A. {De Carvalho} and A. R. {Paschoal} and D. S. {Sanches}},
journal={IEEE Access},
title={A Novel Decomposing Model With Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs},
year={2020},
volume={8},
number={},
pages={181683-181697},}
-
Datasets: Datasets;
-
GA-CFS-ACC Decomposing Model with Genetic Algorithm (Fitness = CFS and ACC) - Python;
-
GA-CFS Decomposing Model with Genetic Algorithm (Fitness = CFS (Filter Approach - Main)) - Python;
-
GA-wrapper Decomposing Model with Genetic Algorithm (Wrapper approach) - Python;
-
PSO-wrapper Decomposing Model with Particle Swarm Optimization (Wrapper approach) - Python;
-
README: Documentation;
-
Requirements: List of items to be installed using pip install.
-
split_train_test Split dataset into training and testing - Python;
- Python (>=3.7.4)
- NumPy
- Pandas
- Scikit-learn
- Skfeature-chappers
$ git clone https://github.com/Bonidia/FeatureSelection-FSRV.git FeatureSelection-FSRV
$ cd FeatureSelection-FSRV
$ pip3 install -r requirements.txt
Firstly, it is necessary to separate the dataset in training and testing. We will only use the training set for feature selection. The test set will be used to generate a final report with the efficiency of the best feature subset.
Access folder: $ cd FeatureSelection-FSRV
To run (Example): $ python3.7 split_train_test.py -i input -r test_rate
Where:
-i - input - csv format file, e.g., dataset.csv
-r - TEST_RATE - e.g., 0.2, 0.3
This example will generate a training and test file.
Note: Input samples for feature selection must be in csv format.
Dataset: It is important that the csv file contains the following format: feat1, feat2, ..., featk, label
The label/class must be the last column.
Running
python3.7 split_train_test.py -i lncRNA.csv -r 0.2
Access folder: $ cd FeatureSelection-FSRV
To run (Example): $ python3.7 GA-CFS.py -train training.csv -test testing.csv -classifier classifier
Where:
-train - csv format file (training set), e.g., train.csv
-test - csv format file (testing set), e.g., test.csv
-classifier - e.g., 0 = RandomForestClassifier, 1 = DecisionTreeClassifier, 2 = SVM, 3 = KNN,
4 = GaussianNB, 5 = GradientBoosting, 6 = Bagging, 7 = AdaBoost, 8 = MLP
This example will generate a csv file with the selected features.
Note 1: Input samples for feature selection must be in csv format.
Note 2: In this algorithm, the classifier will be used to generate the final report.
Note 3: We will only use the training set for feature selection.
Note 4: The test set will be used to generate a final report with the efficiency of the best feature subset.
Running
python3.7 GA-CFS.py -train training.csv -test testing.csv -classifier 0
Access folder: $ cd FeatureSelection-FSRV
To run (Example): $ python3.7 GA-CFS-ACC.py -train training.csv -test testing.csv -classifier classifier
Where:
-train - csv format file (training set), e.g., train.csv
-test - csv format file (testing set), e.g., test.csv
-classifier - e.g., 0 = RandomForestClassifier, 1 = DecisionTreeClassifier, 2 = SVM, 3 = KNN,
4 = GaussianNB, 5 = GradientBoosting, 6 = Bagging, 7 = AdaBoost, 8 = MLP
This example will generate a csv file with the selected features.
Note 1: Input samples for feature selection must be in csv format.
Note 2: We will only use the training set for feature selection.
Note 3: The test set will be used to generate a final report with the efficiency of the best feature subset.
Running
python3.7 GA-CFS-ACC.py -train training.csv -test testing.csv -classifier 3
Access folder: $ cd FeatureSelection-FSRV
To run (Example): $ python3.7 GA-wrapper.py -train training.csv -test testing.csv -classifier classifier
Where:
-train - csv format file (training set), e.g., train.csv
-test - csv format file (testing set), e.g., test.csv
-classifier - e.g., 0 = RandomForestClassifier, 1 = DecisionTreeClassifier, 2 = SVM, 3 = KNN,
4 = GaussianNB, 5 = GradientBoosting, 6 = Bagging, 7 = AdaBoost, 8 = MLP
This example will generate a csv file with the selected features.
Note 1: Input samples for feature selection must be in csv format.
Note 2: We will only use the training set for feature selection.
Note 3: The test set will be used to generate a final report with the efficiency of the best feature subset.
Running
python3.7 GA-wrapper.py -train training.csv -test testing.csv -classifier 2
Access folder: $ cd FeatureSelection-FSRV
To run (Example): $ python3.7 PSO-wrapper.py -train training.csv -test testing.csv -classifier classifier
Where:
-train - csv format file (training set), e.g., train.csv
-test - csv format file (testing set), e.g., test.csv
-classifier - e.g., 0 = RandomForestClassifier, 1 = DecisionTreeClassifier, 2 = SVM, 3 = KNN,
4 = GaussianNB, 5 = GradientBoosting, 6 = Bagging, 7 = AdaBoost, 8 = MLP
This example will generate a csv file with the selected features.
Note 1: Input samples for feature selection must be in csv format.
Note 2: We will only use the training set for feature selection.
Note 3: The test set will be used to generate a final report with the efficiency of the best feature subset.
Running
python3.7 PSO-wrapper.py -train training.csv -test testing.csv -classifier 2
If you use this code in a scientific publication, we would appreciate citations to the following paper:
R. P. Bonidia et al., "A Novel Decomposing Model With Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs," in IEEE Access, vol. 8, pp. 181683-181697, 2020, doi: 10.1109/ACCESS.2020.3028039.
@ARTICLE{9210051,
author={R. P. {Bonidia} and J. S. {Machida} and T. C. {Negri} and W. A. L. {Alves} and A. Y. {Kashiwabara} and D. S. {Domingues} and A. {De Carvalho} and A. R. {Paschoal} and D. S. {Sanches}},
journal={IEEE Access},
title={A Novel Decomposing Model With Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs},
year={2020},
volume={8},
number={},
pages={181683-181697},}