Different ML algorithms applied to Breast Cancer dataset to predict whether a tumor is benign or malignant
The dataset consists of the following fields.
- Sample code number: id number
- Clump Thickness: 1 - 10
- Uniformity of Cell Size: 1 - 10
- Uniformity of Cell Shape: 1 - 10
- Marginal Adhesion: 1 - 10
- Single Epithelial Cell Size: 1 - 10
- Bare Nuclei: 1 - 10
- Bland Chromatin: 1 - 10
- Normal Nucleoli: 1 - 10
- Mitoses: 1 - 10
- Class: (2 for benign, 4 for malignant)
There are 699 instances and each containing information on 9 features of the tumor. The last field is the Class field specifying which class the tumor belongs to.
The approaches implemented are -
-
a. Instance Selection
b. Classificatisn using fuzzy rough nearest neighbour classifier -
a. K-means + C4.5 decision tree classifier
-
a. Feature Selection using Decision Trees
b. Reduction of features using PCA
c. NN Classifier