Is smoothing really needed for prob calc in bayes_classifier? #12

jiabinf · 2016-10-18T08:19:48Z

Thanks for creating NaturalNode!

I am using your Bayes Classifier in my project, when looking into the implementation, I found it adds smoothing when calculating the probabilities.

This smoothing on unknown words in test set will cause probability to be skewed towards whichever class has the least amount of features. For instance:

say smoothing === 1, class A has 2 features, class B has 3, (0 + 1) / 2 is bigger than (0 + 1) / 3, A also wins.

I understand it may be good to have smoothing in training set, but is it really necessary for test set? Why not just discarding the tokens which are not in classFeatures[label]?

    while(i--) {
        if(observation[i]) {
            var count = this.classFeatures[label][i] || this.smoothing;
            // numbers are tiny, add logs rather than take product
            prob += Math.log(count / this.classTotals[label]);
        }
    }

The text was updated successfully, but these errors were encountered:

jiabinf · 2016-10-18T18:43:28Z

Rephased my question, also found some discussion here: http://stats.stackexchange.com/a/108990

DrDub · 2016-11-03T17:52:21Z

Hi, it is correct that if you are evaluating a single unknown feature the system will always pick the same class for it. In general it is the majority class, but as you point out, smoothing might change that.

In general, Laplacian smoothing is a very poor smoothing technique but smoothing in general hinges in the how much probability mass allocate to unseen events. Don't use it during test seems to miss the point, but in case of a very poor smoothing algorithm, you might get ahead, yes. I hope to add Good Turing smoothing in some moment (and don't cry too much about it: www.csie.ntu.edu.tw/~b92b02053/print/good-turing-smoothing-without.pdf )

By the way, you can disable smoothing by setting epsilon to zero when using the classifier (in test mode).

If this answers your question, consider closing this bug as it does not affect the implementation of the algorithms in the code.

jiabinf · 2016-12-27T07:36:06Z

@DrDub thanks for your reply. I set smoothing to 0.01 and keep different training set balanced, overall it works well.

Still, looking forward to the new Good Turing smoothing. 👍

ghost · 2018-03-22T05:16:07Z

@DrDub +1 for the PDF. It looks very interesting, I'll read it as soon as I have some free time.

jiabinf changed the title ~~About bayes_classifier#probabilityOfClass~~ Is smoothing really needed for prob calc in bayes_classifier? Oct 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is smoothing really needed for prob calc in bayes_classifier? #12

Is smoothing really needed for prob calc in bayes_classifier? #12

jiabinf commented Oct 18, 2016 •

edited

Loading

jiabinf commented Oct 18, 2016

DrDub commented Nov 3, 2016

jiabinf commented Dec 27, 2016

ghost commented Mar 22, 2018 •

edited by ghost

Loading

Is smoothing really needed for prob calc in bayes_classifier? #12

Is smoothing really needed for prob calc in bayes_classifier? #12

Comments

jiabinf commented Oct 18, 2016 • edited Loading

jiabinf commented Oct 18, 2016

DrDub commented Nov 3, 2016

jiabinf commented Dec 27, 2016

ghost commented Mar 22, 2018 • edited by ghost Loading

jiabinf commented Oct 18, 2016 •

edited

Loading

ghost commented Mar 22, 2018 •

edited by ghost

Loading