Skip to content

Files

Latest commit

c62bc28 · Apr 23, 2017

History

History

SGD_kernels_multiclass

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Apr 23, 2017
Apr 22, 2017
Apr 22, 2017
Apr 22, 2017
Apr 23, 2017
Apr 23, 2017
Apr 22, 2017
Apr 22, 2017
Apr 22, 2017
Apr 22, 2017
Apr 22, 2017
Apr 23, 2017
Apr 23, 2017

Implement Stochastic Gradeint desccent over the multiclass hinge loss with kernels.

  • Used polynomial kernel of the form k(x, x') = (<x, x'>)p, where p is the parameter or degree of polynomial.
  • Suggested learning rate is eta = 1/sqrt(i) where i is the ith sample while iterating through the training data.
  • Matrix Delta is the amount of penalty you would pay in case of wrong prediction.
    • Case 1: you pay 0 for each correct prediction and 1 for each wrong one
    • Case 2: you pay 0 for each correct prediction, 1 for each wrong prediction between classes whose digits are one number apart one from the other (e.g. you predicted "2" and the correct label is "3"), and pay 2 for all the other cases.

RESULTS

Without Kernels

  • 0/1 loss for Delta1 = 8.52%
  • 0/1 loss for Delta2 = 8.31%
  • Hence we see that when there are no kernels, larger penalizing or using different Delta's helped in reducing loss. But when there are kernels it didn't matter.
  • Also attached without_Kernels_confusionMatrix_Delta1.png and without_Kernels_confusionMatrix_Delta2.png.

With Kernels

  • 0/1 loss for both the deltas is the same = 3.75% It didn't matter what deltas were defined.
  • The loss run on 100 samples was 37%. On 1000 samples was 14% and 10000 samples was 6.33%. Loss was reducing very fast. It didn't matter much after certain number of samples.
  • Also attached confusion_matrix.png which is same for both the deltas.