Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Estimator Pickling Demo & Adding to Docs #3154

Merged
merged 9 commits into from
Nov 30, 2020

Conversation

cjnolet
Copy link
Member

@cjnolet cjnolet commented Nov 18, 2020

We have had a few inquiries recently about model serialization and I think it would be useful to have a very simple notebook demonstrating very basic operation.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@cjnolet cjnolet changed the title [WIP] Adding simple dask estimator demo notebook [REVIEW] Estimator Pickling Demo & Adding to Docs Nov 18, 2020
@cjnolet cjnolet added 3 - Ready for Review Ready for review by team doc Documentation Notebook Issue or PR related to cuML notebook examples labels Nov 18, 2020
Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, @cjnolet! Good, straightforward walkthrough. My only general feedback is that it might be helpful to have a little bit more context for some of what you're demonstrating in the middle there.

@codecov-io
Copy link

codecov-io commented Nov 18, 2020

Codecov Report

Merging #3154 (b822dfc) into branch-0.17 (d8b4765) will increase coverage by 0.18%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.17    #3154      +/-   ##
===============================================
+ Coverage        70.50%   70.68%   +0.18%     
===============================================
  Files              196      197       +1     
  Lines            15442    15564     +122     
===============================================
+ Hits             10887    11002     +115     
- Misses            4555     4562       +7     
Impacted Files Coverage Δ
python/cuml/manifold/umap.pyx 92.40% <0.00%> (-0.32%) ⬇️
python/cuml/__init__.py 100.00% <0.00%> (ø)
python/cuml/metrics/__init__.py 100.00% <0.00%> (ø)
python/cuml/common/array_sparse.py 94.44% <0.00%> (ø)
python/cuml/dask/common/__init__.py 100.00% <0.00%> (ø)
python/cuml/dask/solvers/__init__.py 80.00% <0.00%> (ø)
python/cuml/dask/ensemble/__init__.py 83.33% <0.00%> (ø)
python/cuml/dask/manifold/__init__.py 80.00% <0.00%> (ø)
python/cuml/dask/neighbors/__init__.py 85.71% <0.00%> (ø)
python/cuml/metrics/cluster/__init__.py 100.00% <0.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8b4765...b822dfc. Read the comment docs.

Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pickling examples look great, though we might want to make it even shorter just so we're purely emphasizing persistence. I'd take out the first visualization and maybe slim one or two other comments as noted. The single-gpu example here is very barebones, and I think that works really well. MG is obviously more complex, but minimizing complexity is really helpful there too.

docs/source/pickling_cuml_models.ipynb Outdated Show resolved Hide resolved
docs/source/pickling_cuml_models.ipynb Outdated Show resolved Hide resolved
Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @cjnolet! Just one missing word, but besides that it looks good.

I noticed that you dropped the actual example of ParallelPostFit, which I thought was helpful before. @JohnZed is that what you meant in your comment about "minimizing complexity" with MG? If not, I'd advocate for bringing it back, since I thought it was pretty straightforward and useful. It's not absolutely necessary for this demo, though, so let's not hold up this PR if there's disagreement there.

@cjnolet
Copy link
Member Author

cjnolet commented Nov 19, 2020

@wphicks, as far as content is concerned, I fully agree that it would be useful to bring back the ParallelPostFit cell.

It's not absolutely necessary for this demo, though, so let's not hold up this PR if there's disagreement there.

I could go either way with it. While it's not directly related to pickling models, I do think it provides a users a useful example that they may otherwise not know about. The other challenge is that in order for CI to build and run the notebook with the current configuration, it looks like either cuml would need to add dask-ml as a dependency or the RAPIDS doc-building umbrella recipe will need to depend on it. I'm okay with either merging before or after we figure out which direction to go with the dependency. I don't think there's a huge rush to get it in, but there have been 2 users in the past couple of weeks that have been confused about pickling of distributed models.

Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! In that case, I'd advocate for getting it merged as is and doing a follow-on once we have time to figure out the dependency issues, etc. I did see the user questions you mentioned, so it would be good to make sure we put something up for that eventually, but this PR is useful and clear in its current state. Let's merge it!

@cjnolet cjnolet removed the 3 - Ready for Review Ready for review by team label Nov 30, 2020
@cjnolet cjnolet added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Nov 30, 2020
@cjnolet
Copy link
Member Author

cjnolet commented Nov 30, 2020

Created #3199, which we should be able to get in early 0.18. Merging this for now to get pickling docs in for 0.17.

@cjnolet cjnolet merged commit 63c8a44 into rapidsai:branch-0.17 Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge doc Documentation Notebook Issue or PR related to cuML notebook examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants