-
-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onelearn implementation: AMF & Mondrian Tree Classifiers #1131
Conversation
Hey hey :) I don't have much time right now to check the code, but maybe @smastelini is available? I'll answer some of your questions though:
Trees are usually multi-class classifiers. We expact multi-class classifiers to support any label of type
What do you mean you didn't manage to compile? Yeah some doctests would be nice, at the minimum. |
- Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"
Hi @AlexandreChaussard and @MaxHalford 😃 I can take care of this review. It's going to be a good opportunity to learn more about AMF. I'm currently enjoying the holidays with my family, but I will start reviewing next week! If needed, besides leaving comments, I can create a PR to @AlexandreChaussard's fork with some suggestions. |
Yes, let's take the time to review this properly, there's no rush. This has the potential to be one of our best algorithms, so it's worth spending time on it. |
- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels
Hey ! 👋 I've been fixing all the previously stated points, besides the example docstring that I keep for when the model will be completely reviewed and ready to go 😄 Happy new years team ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AlexandreChaussard, I started reviewing the code and am impressed by the effort you put into implementing AMF.
I barely scratched the surface yet and left some initial comments, mostly on style. One thing popped up though, I see you're keeping track of class numbers and expects a predefined number of classes.
I suggest keeping defining a set of observed classes, as the Hoeffding Tree Classifier does. This way, new classes are allowed to appear and you can access the properties that are necessary to make predictions and perform other tasks.
If this is possible, the classes become any hashable object, as it is the case in the Hoeffding Trees. What do you think?
I'll keep reviewing your code
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Thanks ❤️
Code-wise, I don't think this would be any problem to implement, there's already a dictionary collecting the observed classes that is called If I'm making an error here, please let me know I'm no expert of AMF 😄
The classes can be any string, int or boolean at the moment as @MaxHalford suggested (typing.ClfTarget). I'm not sure what you mean by hashable object instead (like something to put in an HashMap? It's kinda what does the dictionnary I'm busy with exams at the moment, but I'll be working on all these on Thursday for sure! |
Yeap! You got it :) All these things (ints, etc.) can go in a set. Do worry about changing things now. Take your time with the exams. It's actually better this way, so that I can take some time to delve deep in the code until the end of the week. Cheers! |
- Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter
- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)
Hey 👋 I've fixed/answered all the previous reviews, and I've added support for the random state ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is one more wave of comments. Once they are fixed, I'll create a PR for your fork with code-style suggestions. Keep up the excellent work!
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
- Fixing import order in __init__ file of ensemble - Using LaTeX formulation in AMFClassifier description - Making all nodes related methods private (it shouldn't be used outside) - Docstring syntax update and fixes - Importing river.base instead of typing module for better readability - Adding a short description to the MondrianTreeClassifier - Renaming MondrianTreeLeaf into MondrianLeaf - Reordering functions in MondrianTreeClassifier for better readability
- Adding suggestions from Mastelini on keys usage - Removing useless initialization of scores in the MondrianTreeClassifier
…to suggestions2 Critical issue to fix in AMFClassifier before merging into main
Style suggestions merge from Mastelini
fix remaining tests and remove duplicated method call
- Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Hey (again) @MaxHalford ! Any chance you could have a look as well so maybe we can go for the merge :3 ? Cheers 🥳 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much, @AlexandreChaussard, for all the work on this PR!
You have addressed all the main problems identified in the code and paved the way for the upcoming AMF regressor.
It remains to decide whether or not we can get rid of the n_classes
parameter and track incoming classes automatically. We will wait for the answer from AMF's original authors and proceed to make changes if needed afterward.
Description of the PR
Hi ! 👋
This is a first version of Onelearn's library (classifiers only) implementation in River.
It contains:
The original repository with proof of working implementation can be found here (see script.py).
Notes on the utils
Currently I placed two functions in the utils section:
sample_discreteIt might seem overkill to place them as utils right now looking at where they're used in the code, but I'll need them for the regressors too when the times come. Maybe there's a better place for them keeping the regressors in mind though.