Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cuML MNMG estimators to be serialized #5571

Merged

Conversation

viclafargue
Copy link
Contributor

@viclafargue viclafargue commented Sep 1, 2023

This PR :

  • Modifies the base MNMG class to allow the distributed model to be serialized
    - Edit the NB and TF-IDF estimators to prevent their model from being serialized as Dask futures
  • Additionally, edit TF-IDF estimator to allow its use in an ML pipeline
  • Adds a test for MNMG estimator serialization

@viclafargue viclafargue requested a review from a team as a code owner September 1, 2023 09:10
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 1, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the Cython / Python Cython or Python issue label Sep 1, 2023
@@ -167,6 +166,8 @@ def fit(self, X):

wait_and_raise_from_futures([models])

models = models.result()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line missing from these two specific estimators or should it be added in other models as well (e.g. Logistic regression)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR only fixes NB and TF-IDF. But, in the long run we should fix all of the estimators.

@betatim
Copy link
Member

betatim commented Sep 1, 2023

Thanks for the PR! I'll see if this solves my problem.

Is it worth adding a non-regression test? That way the bug won't come back in the future :D

Can't really review it, as I know too little about cuml + dask :(

@viclafargue
Copy link
Contributor Author

Thanks for the PR! I'll see if this solves my problem.

Is it worth adding a non-regression test? That way the bug won't come back in the future :D

Can't really review it, as I know too little about cuml + dask :(

Thanks :). Sure, we should probably add some tests. I will do that later.

@betatim
Copy link
Member

betatim commented Sep 1, 2023

Thanks for the PR! I'll see if this solves my problem.

Unfortunately this doesn't solve my problem with LogisticRegression

@betatim
Copy link
Member

betatim commented Sep 1, 2023

For context the problem I'm seeing and thought this might solve in the output.txt in https://gist.github.com/betatim/712dc3bd19237e928c200e2ab954c0e3

@dantegd
Copy link
Member

dantegd commented Sep 1, 2023

@viclafargue can you add a more descriptive title and description to the PR?

@viclafargue viclafargue changed the title Fix Dask estimators serialization Allow cuML MNMG estimators to be serialized Sep 4, 2023
@dantegd
Copy link
Member

dantegd commented Sep 13, 2023

/ok to test

@dantegd
Copy link
Member

dantegd commented Sep 19, 2023

@viclafargue there seems to be a CI failure in dask tests, could you take a look at it?

@dantegd dantegd added the 4 - Waiting on Author Waiting for author to respond to review label Sep 19, 2023
@viclafargue viclafargue force-pushed the fix-dask-models-serialization branch from a60daba to 8877d8e Compare September 19, 2023 10:41
@csadorf csadorf added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 19, 2023
@csadorf
Copy link
Contributor

csadorf commented Sep 19, 2023

/ok to test

python/cuml/dask/common/base.py Outdated Show resolved Hide resolved
python/cuml/dask/common/base.py Outdated Show resolved Hide resolved
python/cuml/dask/common/base.py Outdated Show resolved Hide resolved
python/cuml/dask/common/base.py Outdated Show resolved Hide resolved
@csadorf
Copy link
Contributor

csadorf commented Sep 20, 2023

/ok to test

@dantegd
Copy link
Member

dantegd commented Sep 25, 2023

/ok to test

@dantegd dantegd added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Waiting on Author Waiting for author to respond to review labels Sep 25, 2023
@dantegd
Copy link
Member

dantegd commented Oct 4, 2023

/ok to test

@dantegd
Copy link
Member

dantegd commented Oct 6, 2023

/merge

@rapids-bot rapids-bot bot merged commit e5d6bc2 into rapidsai:branch-23.10 Oct 6, 2023
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants