A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.
It includes docs and example applications that demonstrate the RAPIDS.ai GPU-accelerated XGBoost-Spark project. It now supports Spark 3.0.0+
Try one of the "Getting Started Guides" below. Please note that they target the Mortgage dataset as written,
but with a few changes to EXAMPLE_CLASS
and dataPath
, they can be easily adapted to the Taxi or Agaricus datasets.
You can get a small size datasets for each example in the datasets folder. These datasets are only provided for convenience. In order to test for performance, please prepare a larger dataset by following Preparing Datasets via Notebook. We also provide a larger dataset: Morgage Dataset (1 GB uncompressed), which is used in the guides below.
- Prepare packages and dataset
- Getting started on on-premises clusters
- Getting started on cloud service providers
- Amazon AWS
- Databricks
- Getting started for Jupyter Notebook applications
These examples use default parameters for demo purposes. For a full list please see "Supported Parameters" for Scala or Python
See the Contributing guide.
Please see the RAPIDS website for contact information.
This content is licensed under the Apache License 2.0