A comprehensive implementation of a Fuzzy Inference System (FIS) for breast cancer classification using the Wisconsin Breast Cancer Dataset (WDBC).
This repository contains a complete pipeline for breast cancer classification using fuzzy logic and machine learning. The system analyzes the Wisconsin Breast Cancer Dataset to classify tumors as benign or malignant using various fuzzy membership functions and rule-based inference systems.
The implementation includes:
- Data preprocessing and feature selection
- Fuzzy Inference System with multiple membership function configurations
- Comparison with traditional machine learning models
- Comprehensive performance evaluation
-
Dataset Analysis: Comprehensive analysis of the Wisconsin Breast Cancer Dataset
- Feature selection using chi-squared and Random Forest Classifier
- Correlation analysis and heatmap visualization
- Distribution analysis with KDE plots
- Linearity analysis using convex hull
- K-means clustering
-
Fuzzy Inference System:
- Multiple membership function configurations (Gaussian, Z-shaped, S-shaped, Triangular, Trapezoidal)
- Rule-based inference system for classification
- Five different defuzzification methods (centroid, bisector, mean of maximum, smallest of maximum, largest of maximum)
- Threshold-based crisp output conversion
-
Machine Learning Comparison:
- Logistic Regression
- Decision Tree Classification
- Random Forest Classification
- Neural Network
-
Performance Evaluation:
- Accuracy, Sensitivity, Specificity metrics
- Confusion matrix visualization
- Comprehensive logging and result analysis
- Python 3.6+
- Required packages:
- pandas
- numpy
- scikit-fuzzy
- scikit-learn
- tensorflow
- matplotlib
- seaborn
- Clone this repository
- Install required packages:
pip install pandas numpy scikit-fuzzy scikit-learn tensorflow matplotlib seaborn
- Download the Wisconsin Breast Cancer Dataset and place it in the
data
directory- The dataset should be named
wdbc.data
- Create the data directory if it doesn't exist:
mkdir -p data
- The dataset should be named
Run the dataset analysis script to preprocess the data and perform feature selection:
python dataset_analysis.py
This will:
- Load and preprocess the Wisconsin Breast Cancer Dataset
- Perform feature selection
- Generate visualizations in the
plots
directory - Save the processed dataset to
data/wdbc_selected_cols.csv
Run the FIS implementation to classify breast cancer samples:
python fis.py
This will:
- Load the preprocessed dataset
- Configure and run multiple FIS tests with different membership functions
- Compare FIS performance with traditional machine learning models
- Save results to
data/predict_log.csv
anddata/predict_log_sum.csv
- Generate confusion matrix visualizations in the
plots/cm
directory
This project is licensed under the MIT License - see the LICENSE file for details.