Skip to content

Commit

Permalink
Document ability to export cuML RF to predict on other machines (#3890)
Browse files Browse the repository at this point in the history
See #3853 (comment)

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #3890
  • Loading branch information
hcho3 authored May 28, 2021
1 parent 92484fb commit 5af84b3
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 1 deletion.
66 changes: 65 additions & 1 deletion docs/source/pickling_cuml_models.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,70 @@
"source": [
"single_gpu_model.cluster_centers_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exporting cuML Random Forest models for inferencing on machines without GPUs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Starting with cuML version 21.06, you can export cuML Random Forest models and run predictions with them on machines without an NVIDIA GPUs. The [Treelite](https://github.com/dmlc/treelite) package defines an efficient exchange format that lets you portably move the cuML Random Forest models to other machines. We will refer to the exchange format as \"checkpoints.\"\n",
"\n",
"Here are the steps to export the model:\n",
"\n",
"1. Call `to_treelite_checkpoint()` to obtain the checkpoint file from the cuML Random Forest model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from cuml.ensemble import RandomForestClassifier as cumlRandomForestClassifier\n",
"from sklearn.datasets import load_iris\n",
"import numpy as np\n",
"\n",
"X, y = load_iris(return_X_y=True)\n",
"X, y = X.astype(np.float32), y.astype(np.int32)\n",
"clf = cumlRandomForestClassifier(max_depth=3, random_state=0, n_estimators=10)\n",
"clf.fit(X, y)\n",
"\n",
"checkpoint_path = './checkpoint.tl'\n",
"# Export cuML RF model as Treelite checkpoint\n",
"clf.convert_to_treelite_model().to_treelite_checkpoint(checkpoint_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Copy the generated checkpoint file `checkpoint.tl` to another machine on which you'd like to run predictions.\n",
"\n",
"3. On the target machine, install Treelite by running `pip install treelite` or `conda install -c conda-forge treelite`. The machine does not need to have an NVIDIA GPUs and does not need to have cuML installed.\n",
"\n",
"4. You can now load the model from the checkpoint, by running the following on the target machine:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import treelite\n",
"\n",
"# The checkpoint file has been copied over\n",
"checkpoint_path = './checkpoint.tl'\n",
"tl_model = treelite.Model.deserialize(checkpoint_path)\n",
"out_prob = treelite.gtil.predict(tl_model, X, pred_margin=True)\n",
"print(out_prob)"
]
}
],
"metadata": {
Expand All @@ -201,7 +265,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
"version": "3.8.8"
}
},
"nbformat": 4,
Expand Down
5 changes: 5 additions & 0 deletions python/cuml/ensemble/randomforestclassifier.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,11 @@ class RandomForestClassifier(BaseRandomForestModel,
histogram-based algorithm to determine splits, rather than an exact
count. You can tune the size of the histograms with the n_bins parameter.
.. note:: You can export cuML Random Forest models and run predictions
with them on machines without an NVIDIA GPUs. See
https://docs.rapids.ai/api/cuml/nightly/pickling_cuml_models.html
for more details.
**Known Limitations**: This is an early release of the cuML
Random Forest code. It contains a few known limitations:
Expand Down
5 changes: 5 additions & 0 deletions python/cuml/ensemble/randomforestregressor.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,11 @@ class RandomForestRegressor(BaseRandomForestModel,
histogram-based algorithm to determine splits, rather than an exact
count. You can tune the size of the histograms with the n_bins parameter.
.. note:: You can export cuML Random Forest models and run predictions
with them on machines without an NVIDIA GPUs. See
https://docs.rapids.ai/api/cuml/nightly/pickling_cuml_models.html
for more details.
**Known Limitations**: This is an early release of the cuML
Random Forest code. It contains a few known limitations:
Expand Down

0 comments on commit 5af84b3

Please sign in to comment.