SGD_kernels_multiclass

Apr 23, 2017

c62bc28 · Apr 23, 2017

Name	Name	Last commit message	Last commit date
parent directory ..
Readme.md	Readme.md	Update Readme.md	Apr 23, 2017
confusionMatrix_kernels.PNG	confusionMatrix_kernels.PNG	Add files via upload	Apr 22, 2017
generate_poly_features.m	generate_poly_features.m	Add files via upload	Apr 22, 2017
linearKernel.m	linearKernel.m	Add files via upload	Apr 22, 2017
main.m	main.m	sgd without kernel	Apr 23, 2017
mnist.mat	mnist.mat	sgd without kernel	Apr 23, 2017
polynomialKernel.m	polynomialKernel.m	Add files via upload	Apr 22, 2017
test_mhinge_kernel_sgd.m	test_mhinge_kernel_sgd.m	Add files via upload	Apr 22, 2017
test_multiclass_hinge_sgd.m	test_multiclass_hinge_sgd.m	Add files via upload	Apr 22, 2017
train_mhinge_krnel_sgd.m	train_mhinge_krnel_sgd.m	Add files via upload	Apr 22, 2017
train_multiclass_hinge_sgd.m	train_multiclass_hinge_sgd.m	Add files via upload	Apr 22, 2017
without_Kernels_confusionMatrix_Delta1.PNG	without_Kernels_confusionMatrix_Delta1.PNG	sgd without kernel	Apr 23, 2017
without_Kernels_confusionMatrix_Delta2.PNG	without_Kernels_confusionMatrix_Delta2.PNG	sgd without kernel	Apr 23, 2017

Readme.md

Used polynomial kernel of the form k(x, x') = (<x, x'>)^p, where p is the parameter or degree of polynomial.
Suggested learning rate is eta = 1/sqrt(i) where i is the i^th sample while iterating through the training data.
Matrix Delta is the amount of penalty you would pay in case of wrong prediction.
- Case 1: you pay 0 for each correct prediction and 1 for each wrong one
- Case 2: you pay 0 for each correct prediction, 1 for each wrong prediction between classes whose digits are one number apart one from the other (e.g. you predicted "2" and the correct label is "3"), and pay 2 for all the other cases.

RESULTS

0/1 loss for Delta1 = 8.52%
0/1 loss for Delta2 = 8.31%
Hence we see that when there are no kernels, larger penalizing or using different Delta's helped in reducing loss. But when there are kernels it didn't matter.
Also attached without_Kernels_confusionMatrix_Delta1.png and without_Kernels_confusionMatrix_Delta2.png.

0/1 loss for both the deltas is the same = 3.75% It didn't matter what deltas were defined.
The loss run on 100 samples was 37%. On 1000 samples was 14% and 10000 samples was 6.33%. Loss was reducing very fast. It didn't matter much after certain number of samples.
Also attached confusion_matrix.png which is same for both the deltas.