Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NaiveBayes.jl #66

Closed
ablaom opened this issue Feb 5, 2019 · 7 comments
Closed

Implement NaiveBayes.jl #66

ablaom opened this issue Feb 5, 2019 · 7 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ablaom
Copy link
Member

ablaom commented Feb 5, 2019

NaiveBayes.jl

@ablaom ablaom added the enhancement New feature or request label Feb 5, 2019
@tlienart tlienart added the good first issue Good for newcomers label Feb 5, 2019
@ablaom ablaom added good first issue Good for newcomers and removed good first issue Good for newcomers labels Feb 5, 2019
@ayush1999
Copy link
Contributor

@ablaom I'm a bit free at the moment, can I take this up? Also, anything in particular that I should keep in mind while implementing this?

@ablaom
Copy link
Member Author

ablaom commented Feb 6, 2019

That would be great.

I'll have a quick look over the package later today and see if there is anything needing special attention. Feel free to start with a work-in-progress PR.

@ablaom
Copy link
Member Author

ablaom commented Feb 10, 2019

@ayush1999

Of the three classifies there, I suggest we begin with GaussianNB. Get familiar with this model by looking at example/iris.jl in the package.

Let's call our corresponding model GaussiaNB also. We should make this a probabilistic classifier. For now, just give it the single hyperparameter target_type:

struct GaussianNB{T} <: Probabilistic{Any} # replace `Any` with fitresult type later
    target_type::Type{T}
end

We can use DecisionTreeClassifier (now also probabilistic) as a exemplar for this model.

Some notes:

  • The input data X for the MLJBase fit and predict methods is a Tables.jl table, with patterns-as-rows, but the packages fit expects patterns-as-columns. However, it is okay with abstract matrix, so you can take transpose: ie, inside fit and predict you can just do XNB = MLJBase.matrix(X)' (fit) and XNB = MLJBase.matrix(Xnew)' (predict) and give the NB fit and predict the adjoint XNB.

  • Our predict should deliver probabilities (more precisely elements of the distribution UnivariateNominal defined in MLJBase). However, the NB predict method gives classes. So, instead, our predict will call NaiveBayes.predict_logprobs (defined in src/common.jl of the NB package), and extract our distributions from that.

  • NaiveBayes does not preserve categorical levels properly. In particular, when you instantiate the NB model, you can only give it a list of levels that actually occur in the training data; otherwise it crashes. Our fit will need to record all levels of the categorical vector y (including ones that possibly do not occur as elements of y, ie, only in the pool) and bundle this with the returned fitresult (to make them available to predict). Classes which do not occur in the training y will need to be assigned probability zero in all predicted distributions. See adding_new_models.md and our src/interfaces/DecisionTree.jl for more more on this.

@tlienart
Copy link
Collaborator

tlienart commented Feb 12, 2019

Should this issue be moved to the new MLJModels repo? (I guess same question for #35, #44, #32 )

@ayush1999
Copy link
Contributor

ayush1999 commented Feb 12, 2019

@ablaom Seeing that there is a new MLJModels repo now, correct me if I'm wrong:

  1. The NaiveBayes model will go into the MLJModels repo (MLJModels/src).
  2. The Metadata.toml in MLJRegistry needs to be updated for this model.

@ablaom
Copy link
Member Author

ablaom commented Feb 12, 2019

  1. Yes, let's put NaiveBayes in MLJModels/src . You can use MLJModels/src/DecisionTree.jl as I kind of template. One weird thing to note is that the package is imported into your implementation module as import ..NaiveBayes instead of import NaiveBayes. This is because of the way lazy loading works.

  2. Metadata.toml update happens after the model is in place and tested. This is semi-automated, using the traits you define as part of your model implementation (see the traits section of adding_new_models.md). I will take care of this.

@tlienart I'd rather keep all the issues in one place for now, if that's ok.

@ablaom
Copy link
Member Author

ablaom commented Mar 4, 2019

The Gaussian and Multinomial models are now implemented.

Hybrid remains unimplemented, as there is open issue. It is perhaps worth considering this quote from a maintainer before investing more time on NaiveBayes.jl just now:

"Yes, this package was created in prehistoric times and has never been seriously reviewed since then, so it's pretty possible it contains even trivial bugs." source

@ablaom ablaom closed this as completed Mar 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants