Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Estimator Pickling Demo & Adding to Docs #3154

Merged
merged 9 commits into from
Nov 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
- PR #3135: Add QuasiNewton tests
- PR #3040: Improved Array Conversion with CumlArrayDescriptor and Decorators
- PR #3134: Improving the Deprecation Message Formatting in Documentation
- PR #3154: Adding estimator pickling demo notebooks (and docs)
- PR #3151: MNMG Logistic Regression via dask-glm
- PR #3113: Add tags and prefered memory order tags to estimators
- PR #3137: Reorganize Pytest Config and Add Quick Run Option
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Support for Windows is possible in the near future.
cuml_intro.rst
cuml_blogs.rst
estimator_intro.ipynb
pickling_cuml_models.ipynb


Indices and tables
Expand Down
209 changes: 209 additions & 0 deletions docs/source/pickling_cuml_models.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
{
cjnolet marked this conversation as resolved.
Show resolved Hide resolved
cjnolet marked this conversation as resolved.
Show resolved Hide resolved
cjnolet marked this conversation as resolved.
Show resolved Hide resolved
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pickling cuML Models for Persistence\n",
"\n",
"This notebook demonstrates simple pickling of both single-GPU and multi-GPU cuML models for persistence"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings(\"ignore\", category=FutureWarning)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Single GPU Model Pickling\n",
"\n",
"All single-GPU estimators are pickleable. The following example demonstrates the creation of a synthetic dataset, training, and pickling of the resulting model for storage. Trained single-GPU models can also be used to distribute the inference on a Dask cluster, which the `Distributed Model Pickling` section below demonstrates."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from cuml.datasets import make_blobs\n",
"\n",
"X, y = make_blobs(n_samples=50,\n",
" n_features=10,\n",
" centers=5,\n",
" cluster_std=0.4,\n",
" random_state=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from cuml.cluster import KMeans\n",
"\n",
"model = KMeans(n_clusters=5)\n",
"\n",
"model.fit(X)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"\n",
"pickle.dump(model, open(\"kmeans_model.pkl\", \"wb\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = pickle.load(open(\"kmeans_model.pkl\", \"rb\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.cluster_centers_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distributed Model Pickling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The distributed estimator wrappers inside of the `cuml.dask` are not intended to be pickled directly. The Dask cuML estimators provide a function `get_combined_model()`, which returns the trained single-GPU model for pickling. The combined model can be used for inference on a single-GPU, and the `ParallelPostFit` wrapper from the [Dask-ML](https://ml.dask.org/meta-estimators.html) library can be used to perform distributed inference on a Dask cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dask.distributed import Client\n",
"from dask_cuda import LocalCUDACluster\n",
"\n",
"cluster = LocalCUDACluster()\n",
"client = Client(cluster)\n",
"client"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from cuml.dask.datasets import make_blobs\n",
"\n",
"n_workers = len(client.scheduler_info()[\"workers\"].keys())\n",
"\n",
"X, y = make_blobs(n_samples=5000, \n",
" n_features=30,\n",
" centers=5, \n",
" cluster_std=0.4, \n",
" random_state=0,\n",
" n_parts=n_workers*5)\n",
"\n",
"X = X.persist()\n",
"y = y.persist()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from cuml.dask.cluster import KMeans\n",
"\n",
"dist_model = KMeans(n_clusters=5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dist_model.fit(X)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"\n",
"single_gpu_model = dist_model.get_combined_model()\n",
"pickle.dump(single_gpu_model, open(\"kmeans_model.pkl\", \"wb\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"single_gpu_model = pickle.load(open(\"kmeans_model.pkl\", \"rb\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"single_gpu_model.cluster_centers_"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}