Skip to content

rocketmlhq/rapids-notebooks

 
 

Repository files navigation

RAPIDS Notebooks and Utilities

XGBoost Notebook

Folder Notebook Title Description
XGBoost XGBoost Demo This notebook shows the acceleration one can gain by using GPUs with XGBoost in RAPIDS.

CuML Notebooks

The cuML notebooks showcase how to use the machine learning algorithms implemented in cuML along with the advantages of using cuML over scikit-learn. These notebooks compare the time required and the performance of the algorithms. Below are a list of such algorithms:

Folder Notebook Title Description
cuML Coordinate Descent This notebook includes code examples of lasso and elastic net models. These models are placed together so a comparison between the two can also be made in addition to their sklearn equivalent.
cuML DBSCAN Demo This notebook showcases density-based spatial clustering of applications with noise (dbscan) algorithm using the fit and predict functions
cuML HoltWinters Demo This notebook includes code example for the holt-winters algorithm and it showcases the fit and forecast functions.
cuML Forest Inference This notebook shows how to use the forest inference library to load saved models and perform prediction using them. In addition, it also shows how to perform training and prediction using xgboost and lightgbm models.
cuML K-Means Demo This notebook includes code example for the k-means algorithm and it showcases the fit and predict functions.
cuML K-Means MNMG Demo This notebook includes code example for the k-means multi-node multi-GPU algorithm and it showcases the fit and predict functions.
cuML Linear Regression Demo This notebook includes code example for linear regression algorithm and it showcases the fit and predict functions.
cuML Nearest Neighbors_demo This notebook showcases k-nearest neighbors (knn) algorithm using the fit and kneighbors functions
cuML PCA Demo This notebook showcases principal component analysis (PCA) algorithm where the model can be used for prediction (using fit_transform) as well as converting the transformed data into the original dataset (using inverse_transform).
cuML Random Forest Multi-node / Multi-GPU Demonstrates how to fit Random Forest models using multiple GPUs via Dask.
cuML Ridge Regression Demo This notebook includes code examples of ridge regression and it showcases the fit and predict functions.
cuML SGD_Demo The stochastic gradient descent algorithm is demostrated in the notebook using fit and predict functions
cuML TSVD_Demo This notebook showcases truncated singular value decomposition (tsvd) algorithm which like PCA performs both prediction and transformation of the converted dataset into the original data using fit_transform and inverse_transform functions respectively
cuML UMAP_Demo The uniform manifold approximation & projection algorithm is compared with the original author's equivalent non-GPU Python implementation using fit and transform functions
cuML UMAP_Demo_Graphed Demonstration of cuML uniform manifold approximation & projection algorithm's supervised approach against mortgage dataset and comparison of results against the original author's equivalent non-GPU \Python implementation.
cuML UMAP_Demo_Supervised Demostration of UMAP supervised training. Uses a set of labels to perform supervised dimensionality reduction. UMAP can also be trained on datasets with incomplete labels, by using a label of "-1" for unlabeled samples.

CuDF Notebooks

Folder Notebook Title Description
cuDF notebooks_Apply_Operations_in_cuDF This notebook showcases two special methods where cuDF goes beyond the Pandas library: apply_rows and apply_chunk functions. They utilized the Numba library to accelerate the data transformation via GPU in parallel.
cuDF notebooks_numba_cuDF_integration This notebook showcases how to use Numba CUDA to accelerate cuDF data transformation and how to step by step accelerate it using CUDA programming tricks

CuGraph Notebooks

Folder Notebook Title Description
cuGraph Louvain Demonstration of using cuGraph to identify clusters in a test graph using the Louvain algorithm
cuGraph Vertex-Similarity Demonstration of using cuGraph to compute vertex similarity using both the Jaccard Similarity and the Overlap Coefficient.
cuGraph Weighted-Jaccard Demonstration of using cuGraph to compute the Weighted Jaccard Similarity metric on our training dataset.
cuGraph Renumber Demonstrate of using the renumbering features to assigned new vertex IDs to the test graph. This is useful for when the data sets is non-contiguous or not integer values
cuGraph BFS Demonstration of using cuGraph to computer the Bredth First Search space from a given vertex to all other in our training graph
cuGraph SSSP Demonstration of using cuGraph to computer the The Shortest Path from a given vertex to all other in our training graph
cuGraph Spectral-Clustering Demonstration of using cuGraph to identify clusters in a test graph using Spectral Clustering using both the (A) Balance Cut and (B) the Modularity Maximization quality metrics
cuGraph Pagerank Demonstration of using both NetworkX and cuGraph to compute the PageRank of each vertex in our test dataset
cuGraph Triangle Counting Demonstration of using both NetworkX and cuGraph to compute the the number of Triangles in our test dataset
cuGraph Connected Components Demonstration of using cuGraph to compute weakly and strongly connected components in a test graph.

Tutorial with an End to End workflow

Folder Notebook Title Description
Tutorials DBSCAN_demo_full Demonstration of how to use DBSCAN - a popular clustering algorithm - and how to use the GPU accelerated implementation of this algorithm in RAPIDS.
Tutorials HoltWinters_demo_full Demonstration of how to use Holt-Winters, a time-series forecasting algorithm, on a dataset to make GPU accelerated out-of-sample predictions.

Utils Scripts

Folder Script Title Description
Utils start-jupyter.sh starts a JupyterLab environment for interacting with, and running, notebooks
Utils stop-jupyter.sh identifies all process IDs associated with Jupyter and kills them
Utils dask-cluster.py launches a configured Dask cluster (a set of nodes) for use within a notebook
Utils dask-setup.sh a low-level script for constructing a set of Dask workers on a single node
Utils split-data-mortgage.sh splits mortgage data files into smaller parts, and saves them for use with the mortgage notebook

Documentation (WIP)

Folder Document Title Description
Docs ngc-readme
Docs dockerhub-readme

Additional Information

  • The cuml folder also includes a small subset of the Mortgage Dataset used in the notebooks and the full image set from the Fashion MNIST dataset.

  • utils: contains a set of useful scripts for interacting with RAPIDS

  • For additional, community driven notebooks, which will include our blogs, tutorials, workflows, and more intricate examples, please see the Notebooks Extended Repo

About

RAPIDS Sample Notebooks

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 99.0%
  • Other 1.0%