Skip to content

A simple Machine Learning project to predict handwritten digit using KNN

License

Notifications You must be signed in to change notification settings

Spidey03/digit_recognizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digit Recognizer

________  .__       .__  __    __________                                  .__                     
\______ \ |__| ____ |__|/  |_  \______   \ ____   ____  ____   ____   ____ |__|_______ ___________ 
 |    |  \|  |/ ___\|  \   __\  |       _// __ \_/ ___\/  _ \ / ___\ /    \|  \___   // __ \_  __ \
 |    `   \  / /_/  >  ||  |    |    |   \  ___/\  \__(  <_> ) /_/  >   |  \  |/    /\  ___/|  | \/
/_______  /__\___  /|__||__|    |____|_  /\___  >\___  >____/\___  /|___|  /__/_____ \\___  >__|   
        \/  /_____/                    \/     \/     \/     /_____/      \/         \/    \/       

ABOUT

Simplest Machine Learning Model to recognizing hand written digits.

MNIST Dataset

  • Run the following commands to download datasets
wget https://myawsbucket003.s3.ap-south-1.amazonaws.com/AI+ML/Digit+Recognition/datasets/mnist_test.csv\n

wget https://myawsbucket003.s3.ap-south-1.amazonaws.com/AI+ML/Digit+Recognition/datasets/mnist_train.csv

MNIST consists of 60,000 handwritten digit images of all numbers from zero to nine. Each image has its corresponding label number representing the number in image.

Each image contains a single grayscale digit drawn by hand. And each image is a 784 dimensional vector (28 pixels for both height and width) of floating-point numbers where each value represents a pixel’s brightness.

the data sets also provided in notebook.

Note:

After downloading update path train_data_file, test_data_file in __init__ method

KNN

K Nearest Neighbors is a classification algorithm. It classifies the new data point (test input) into some category. To do so it basically looks at the new datapoint’s distance from all other data points in training set. Then out of the k closest training datapoints the class in majority is assigned to that new test data point.
Pretty Simple Right :)

Finding Distance

The LN Norm distances gives the distance between two points.

If n is 1, the LN Norm is Manhattan distance
If n is 2, it is Euclidean distance

By using LN Norm, we can find distances between test input and training input

distance

Get K Nearest Neighbours

After finding the distances between test and train inputs, to know in which classification the test input would be, we need to get k nearest neighbours.

Voting

Distance Voting

Among KNN there could be voting tie,
One of the way to broke voting tie is Distance Weighted KNN, i.e., sum of inverse of distances of all predicted labels

equation
for all k nearest neighbours in KNN

Majority Voting

Among KNN there could be voting tie,
Another of the way to broke voting tie is Majority Based KNN, i.e., out of predicted labels which label is most repeated that label does chosed

Hyperparameter Tuning

Choosing what are the best values of n and k that gives the best accuracy.

About

A simple Machine Learning project to predict handwritten digit using KNN

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published