A basic ML project on classifying data from the MNIST database using various methods.
These are basic examples of classification using the MNIST dataset. The four programs use a Naive Bayes classifier, a k-Nearest Neighbors classifier, Linear Discriminant Analysis, and Principal Component Analysis. The libraries used are scikit learn and numpy. The first three programs are designed in such a way that they can be used by other programs, but running it as a main program will perform the expected classification.
The parse_data program takes in the MNIST files and parses them into a numpy array that can be used by scikit.
The libraries used were numpy, matplotlib, and scikit learn. The 64-bit version of Python 3.8 is recommended to prevent memory issues. Once the requisite libraries are installed, the extracted MNIST files must be kept in the same directory as the programs. Then simply running the programs will perform the classification and output the error rates for the training and testing.
e.g. Running the Naive Bayes Classifier program
python naive.py
Training: 26106 incorrectly classified out of 60000 images. (43.510% error rate) Testing: 4442 incorrectly classified out of 10000 images. (44.420% error rate)
e.g. Running the Fisher Linear Disciminant program
python fisher.py
LDA for: [0, 9] Training: 59 incorrectly classified out of 11872 images. (0.497% error rate) Testing: 23 incorrectly classified out of 1989 images. (1.156% error rate) LDA for: [0, 8] Training: 133 incorrectly classified out of 11774 images. (1.130% error rate) Testing: 20 incorrectly classified out of 1954 images. (1.024% error rate) LDA for: [1, 7] Training: 91 incorrectly classified out of 13007 images. (0.700% error rate) Testing: 23 incorrectly classified out of 2163 images. (1.063% error rate)
e.g. Running the PCA program
python pca.py
n: 5 Naive Bayes Classifier: Training: 21276 incorrectly classified out of 60000 images. (35.460% error rate) Testing: 3420 incorrectly classified out of 10000 images. (34.200% error rate) Nearest Neighbors Classifier: 5 Training: 11181 incorrectly classified out of 60000 images. (18.635% error rate) 5 Testing: 2526 incorrectly classified out of 10000 images. (25.260% error rate)
n: 10 Naive Bayes Classifier: Training: 13777 incorrectly classified out of 60000 images. (22.962% error rate) Testing: 2218 incorrectly classified out of 10000 images. (22.180% error rate) Nearest Neighbors Classifier: 5 Training: 2729 incorrectly classified out of 60000 images. (4.548% error rate) 5 Testing: 724 incorrectly classified out of 10000 images. (7.240% error rate)
n: 20 Naive Bayes Classifier: Training: 9539 incorrectly classified out of 60000 images. (15.898% error rate) Testing: 1468 incorrectly classified out of 10000 images. (14.680% error rate) Nearest Neighbors Classifier: 5 Training: 1137 incorrectly classified out of 60000 images. (1.895% error rate) 5 Testing: 306 incorrectly classified out of 10000 images. (3.060% error rate)
n: 50 Naive Bayes Classifier: Training: 7750 incorrectly classified out of 60000 images. (12.917% error rate) Testing: 1225 incorrectly classified out of 10000 images. (12.250% error rate) Nearest Neighbors Classifier: 5 Training: 841 incorrectly classified out of 60000 images. (1.402% error rate) 5 Testing: 249 incorrectly classified out of 10000 images. (2.490% error rate)
n: 100 Naive Bayes Classifier: Training: 7825 incorrectly classified out of 60000 images. (13.042% error rate) Testing: 1199 incorrectly classified out of 10000 images. (11.990% error rate) Nearest Neighbors Classifier: 5 Training: 954 incorrectly classified out of 60000 images. (1.590% error rate) 5 Testing: 276 incorrectly classified out of 10000 images. (2.760% error rate)