Onelearn implementation: AMF & Mondrian Tree Classifiers #1131

AlexandreChaussard · 2022-12-30T16:18:03Z

Description of the PR

Hi ! 👋

This is a first version of Onelearn's library (classifiers only) implementation in River.
It contains:

Mondrian Tree (Base and Classifier)
Aggregated Mondrian Forest (Base and Classifier)

The original repository with proof of working implementation can be found here (see script.py).

Notes on the utils

Currently I placed two functions in the utils section:

~~sample_discrete~~
log_sum_2_exp

It might seem overkill to place them as utils right now looking at where they're used in the code, but I'll need them for the regressors too when the times come. Maybe there's a better place for them keeping the regressors in mind though.

MaxHalford · 2022-12-31T00:05:32Z

Hey hey :)

I don't have much time right now to check the code, but maybe @smastelini is available? I'll answer some of your questions though:

Management of labels: labels must be positive integers at the moment, since it just makes my life easier. Please tell me how you would prefer this to be implemented in River's framework (I've seen labels presented as dictionary, but I'm not sure what these dictionaries contains: int? string?). This might be a simple trick for me to change how labels are managed I think, just need to know your preferences.

Trees are usually multi-class classifiers. We expact multi-class classifiers to support any label of type bool, int or str (see base/typing.py)

Examples of usage: MondrianTreeClassifier and AMFClassifier would need examples of implementations for users. I actually implemented one on my repo here already, but I didn't manage to compile River so I couldn't get the scores 😢

What do you mean you didn't manage to compile? Yeah some doctests would be nice, at the minimum.

river/utils/random.py

river/tree/mondrian_tree.py

river/ensemble/aggregated_mondrian_forest.py

- Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"

smastelini · 2022-12-31T12:54:51Z

Hi @AlexandreChaussard and @MaxHalford 😃

I can take care of this review. It's going to be a good opportunity to learn more about AMF.

I'm currently enjoying the holidays with my family, but I will start reviewing next week!

If needed, besides leaving comments, I can create a PR to @AlexandreChaussard's fork with some suggestions.

@Getter

- Removing the "__repr__" method of AMF - Removing the @Setter and @Getter - Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end

MaxHalford · 2022-12-31T13:25:03Z

Yes, let's take the time to review this properly, there's no rush. This has the potential to be one of our best algorithms, so it's worth spending time on it.

- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels

AlexandreChaussard · 2022-12-31T15:28:46Z

Hey ! 👋

I've been fixing all the previously stated points, besides the example docstring that I keep for when the model will be completely reviewed and ready to go 😄

Happy new years team ❤️

smastelini

Hi @AlexandreChaussard, I started reviewing the code and am impressed by the effort you put into implementing AMF.

I barely scratched the surface yet and left some initial comments, mostly on style. One thing popped up though, I see you're keeping track of class numbers and expects a predefined number of classes.

I suggest keeping defining a set of observed classes, as the Hoeffding Tree Classifier does. This way, new classes are allowed to appear and you can access the properties that are necessary to make predictions and perform other tasks.

If this is possible, the classes become any hashable object, as it is the case in the Hoeffding Trees. What do you think?

I'll keep reviewing your code

river/tree/__init__.py

river/tree/mondrian/__init__.py

river/tree/mondrian/mondrian_tree.py

river/tree/mondrian/mondrian_tree_nodes.py

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

AlexandreChaussard · 2023-01-02T21:25:14Z

Hi @AlexandreChaussard, I started reviewing the code and am impressed by the effort you put into implementing AMF.

Thanks ❤️

I suggest keeping defining a set of observed classes, as the Hoeffding Tree Classifier does. This way, new classes are allowed to appear and you can access the properties that are necessary to make predictions and perform other tasks.

Code-wise, I don't think this would be any problem to implement, there's already a dictionary collecting the observed classes that is called _classes in the classifiers. It is only used at the moment as a way to translate classes object into positive integers for easy computation later on.
However, it seems that the number of classes must be known before hand to compute the Jeffrey's prior (see description of AMF). If the number of classes could change over time, the current implementation would be incorrect (at first we would have 0 classes, so the dirichlet parameter would be discarded at the bottom and it'd be possible to have probabilities > 1 in the distribution using Jeffrey's prior).

If I'm making an error here, please let me know I'm no expert of AMF 😄

If this is possible, the classes become any hashable object, as it is the case in the Hoeffding Trees. What do you think?

The classes can be any string, int or boolean at the moment as @MaxHalford suggested (typing.ClfTarget). I'm not sure what you mean by hashable object instead (like something to put in an HashMap? It's kinda what does the dictionnary _classes right?)

I'm busy with exams at the moment, but I'll be working on all these on Thursday for sure!
Cheers 🎈

smastelini · 2023-01-02T21:40:05Z

Yeap! You got it :)

All these things (ints, etc.) can go in a set. Do worry about changing things now. Take your time with the exams. It's actually better this way, so that I can take some time to delve deep in the code until the end of the week. Cheers!

- Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter

- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)

AlexandreChaussard · 2023-01-05T21:00:49Z

Hey 👋

I've fixed/answered all the previous reviews, and I've added support for the random state !

smastelini

Here is one more wave of comments. Once they are fixed, I'll create a PR for your fork with code-style suggestions. Keep up the excellent work!

river/ensemble/aggregated_mondrian_forest.py

river/tree/mondrian/mondrian_tree_classifier.py

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

- Fixing import order in __init__ file of ensemble - Using LaTeX formulation in AMFClassifier description - Making all nodes related methods private (it shouldn't be used outside) - Docstring syntax update and fixes - Importing river.base instead of typing module for better readability - Adding a short description to the MondrianTreeClassifier - Renaming MondrianTreeLeaf into MondrianLeaf - Reordering functions in MondrianTreeClassifier for better readability

river/ensemble/aggregated_mondrian_forest.py

river/tree/mondrian/mondrian_tree_classifier.py

river/tree/mondrian/mondrian_tree_nodes.py

- Adding suggestions from Mastelini on keys usage - Removing useless initialization of scores in the MondrianTreeClassifier

…to suggestions2 Critical issue to fix in AMFClassifier before merging into main

Style suggestions merge from Mastelini

- Fixing scoring bug (no propagation of counts) - Removing unused parameters in docs - Replacing type union of Python 3.10 in 3.9 annotations - Adding little description for MondrianBranch

fix remaining tests and remove duplicated method call

- Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes

river/tree/mondrian/mondrian_tree_classifier.py

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

AlexandreChaussard · 2023-01-16T18:58:21Z

Hey (again) @MaxHalford !
@smastelini and I have worked hard on the classifiers, and we think it's ready to go!

Any chance you could have a look as well so maybe we can go for the merge :3 ?

Cheers 🥳

smastelini

Thank you so much, @AlexandreChaussard, for all the work on this PR!

You have addressed all the main problems identified in the code and paved the way for the upcoming AMF regressor.

It remains to decide whether or not we can get rid of the n_classes parameter and track incoming classes automatically. We will wait for the answer from AMF's original authors and proceed to make changes if needed afterward.

AMF Classifier & Mondrian Tree Classifier implementation

79566eb

AlexandreChaussard requested review from MaxHalford and smastelini as code owners December 30, 2022 16:18

MaxHalford requested changes Dec 31, 2022

View reviewed changes

[Pull request Update]

80371f6

- Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"

[Pull Request]

9b91b9d

- Removing the "__repr__" method of AMF - Removing the @Setter and @Getter - Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end

AlexandreChaussard added 2 commits December 31, 2022 15:05

Updating docstring

01af4a2

[Pull request]

c5fe718

- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels

smastelini requested changes Jan 2, 2023

View reviewed changes

AlexandreChaussard and others added 4 commits January 2, 2023 20:32

[Fix] Reability

545ffaa

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

[Fix] Language

6a93dea

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

[Fix] Language

c0466c5

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

[Fix] math package implementation usage

ecdfd2c

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

AlexandreChaussard and others added 3 commits January 5, 2023 18:37

[Pull request]

ad27aae

- Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter

Merge branch 'main' into main

5bccff8

- Adding support for random state (seed)

5cff1e3

- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)

[Ignoring testing environment]

ff2e8f8

smastelini requested changes Jan 6, 2023

View reviewed changes

AlexandreChaussard and others added 5 commits January 6, 2023 22:44

Fixing style & typos

f64894c

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

Merge branch 'main' of https://github.com/AlexandreChaussard/river

0956cb8

Pre-commit clean up

614da35

Pre-commit clean up

f98b578

AlexandreChaussard added 2 commits January 13, 2023 01:35

Pre-commit hookups fixes

502e685

Merge branch 'main' of https://github.com/AlexandreChaussard/river

06a554a

smastelini reviewed Jan 13, 2023

View reviewed changes

river/ensemble/aggregated_mondrian_forest.py Show resolved Hide resolved

smastelini requested changes Jan 13, 2023

View reviewed changes

AlexandreChaussard and others added 21 commits January 14, 2023 16:04

[Pull request]

f1a54e8

- Adding suggestions from Mastelini on keys usage - Removing useless initialization of scores in the MondrianTreeClassifier

Merge branch 'main' of https://github.com/AlexandreChaussard/river in…

b39cecb

…to suggestions2 Critical issue to fix in AMFClassifier before merging into main

bug fix

ce26918

Merge pull request #3 from AlexandreChaussard/suggestions2

66b152f

Style suggestions merge from Mastelini

fix conflicts

9a97df8

refactored, but has bugs

75725b7

remove mypy skip

d484554

cleanup

4b73ebd

better, but not fixed

1c4e6a0

minor fix

278b949

[Fixes]

500ca01

- Fixing scoring bug (no propagation of counts) - Removing unused parameters in docs - Replacing type union of Python 3.10 in 3.9 annotations - Adding little description for MondrianBranch

Pre-commit hookups fixes

b8d72ae

Refactor class hierarchy

22ec1b9

fix some tests

5bad2e1

Fixing some PyTests

dcf8dda

Reworking intensities

1f83869

fix remaining tests and remove duplicated method call

55366d8

Fixing feature shuffle issue (ordering in features)

02a1396

fix remaining tests and remove duplicated method call

Merge branch 'online-ml:main' into main

e7f506c

[Pull request]

bc33142

- Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes

Hiding MondrianTree from users visibility

83f96c9

smastelini reviewed Jan 16, 2023

View reviewed changes

river/tree/mondrian/mondrian_tree_classifier.py Outdated Show resolved Hide resolved

Fixing import on Mondrian Tree example

b7457fc

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

smastelini approved these changes Jan 16, 2023

View reviewed changes

MaxHalford merged commit 41410a8 into online-ml:main Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onelearn implementation: AMF & Mondrian Tree Classifiers #1131

Onelearn implementation: AMF & Mondrian Tree Classifiers #1131

AlexandreChaussard commented Dec 30, 2022 •

edited

Loading

MaxHalford commented Dec 31, 2022

smastelini commented Dec 31, 2022

MaxHalford commented Dec 31, 2022

AlexandreChaussard commented Dec 31, 2022

smastelini left a comment

AlexandreChaussard commented Jan 2, 2023

smastelini commented Jan 2, 2023

AlexandreChaussard commented Jan 5, 2023

smastelini left a comment

AlexandreChaussard commented Jan 16, 2023

smastelini left a comment

Onelearn implementation: AMF & Mondrian Tree Classifiers #1131

Onelearn implementation: AMF & Mondrian Tree Classifiers #1131

Conversation

AlexandreChaussard commented Dec 30, 2022 • edited Loading

Description of the PR

Notes on the utils

MaxHalford commented Dec 31, 2022

smastelini commented Dec 31, 2022

MaxHalford commented Dec 31, 2022

AlexandreChaussard commented Dec 31, 2022

smastelini left a comment

Choose a reason for hiding this comment

AlexandreChaussard commented Jan 2, 2023

smastelini commented Jan 2, 2023

AlexandreChaussard commented Jan 5, 2023

smastelini left a comment

Choose a reason for hiding this comment

AlexandreChaussard commented Jan 16, 2023

smastelini left a comment

Choose a reason for hiding this comment

AlexandreChaussard commented Dec 30, 2022 •

edited

Loading