Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edits on tutorial for XGBoost job on Kubernetes #5487

Merged
merged 1 commit into from
Apr 5, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 19 additions & 21 deletions doc/tutorials/kubernetes.rst
Original file line number Diff line number Diff line change
@@ -1,36 +1,34 @@
###################################
Distributed XGBoost with Kubernetes
Distributed XGBoost on Kubernetes
###################################

Kubeflow community provides `XGBoost Operator <https://github.com/kubeflow/xgboost-operator>`_ to support distributed XGBoost training and batch prediction in a Kubernetes cluster. It provides an easy and efficient XGBoost model training and batch prediction in distributed fashion.
Distributed XGBoost training and batch prediction on `Kubernetes <https://kubernetes.io/>`_ are supported via `Kubeflow XGBoost Operator <https://github.com/kubeflow/xgboost-operator>`_.

**********
How to use
**********
In order to run a XGBoost job in a Kubernetes cluster, carry out the following steps:
************
Instructions
************
In order to run a XGBoost job in a Kubernetes cluster, perform the following steps:

1. Install XGBoost Operator in Kubernetes.
1. Install XGBoost Operator on the Kubernetes cluster.

a. XGBoost Operator is designed to manage XGBoost jobs, including job scheduling, monitoring, pods and services recovery etc. Follow the `installation guide <https://github.com/kubeflow/xgboost-operator#installing-xgboost-operator>`_ to install XGBoost Operator.
a. XGBoost Operator is designed to manage the scheduling and monitoring of XGBoost jobs. Follow `this installation guide <https://github.com/kubeflow/xgboost-operator#installing-xgboost-operator>`_ to install XGBoost Operator.

2. Write application code to interface with the XGBoost operator.
2. Write application code that will be executed by the XGBoost Operator.

a. You'll need to furnish a few scripts to inteface with the XGBoost operator. Refer to the `Iris classification example <https://github.com/kubeflow/xgboost-operator/tree/master/config/samples/xgboost-dist>`_.
b. Data reader/writer: you need to have your data source reader and writer based on the requirement. For example, if your data is stored in a Hive Table, you have to write your own code to read/write Hive table based on the ID of worker.
c. Model persistence: in this example, model is stored in the OSS storage. If you want to store your model into Amazon S3, Google NFS or other storage, you'll need to specify the model reader and writer based on the requirement of storage system.
a. To use XGBoost Operator, you'll have to write a couple of Python scripts that implement the distributed training logic for XGBoost. Please refer to the `Iris classification example <https://github.com/kubeflow/xgboost-operator/tree/master/config/samples/xgboost-dist>`_.
b. Data reader/writer: you need to implement the data reader and writer based on the specific requirements of your chosen data source. For example, if your dataset is stored in a Hive table, you have to write the code to read from or write to the Hive table based on the index of the worker.
c. Model persistence: in the `Iris classification example <https://github.com/kubeflow/xgboost-operator/tree/master/config/samples/xgboost-dist>`_, the model is stored in `Alibaba OSS <https://www.alibabacloud.com/product/oss>`_. If you want to store your model in other storages such as Amazon S3 or Google NFS, you'll need to implement the model persistence logic based on the requirements of the chosen storage system.

3. Configure the XGBoost job using a YAML file.

a. YAML file is used to configure the computation resource and environment for your XGBoost job to run, e.g. the number of workers and masters. The template `YAML template <https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/xgboostjob_v1alpha1_iris_train.yaml>`_ is provided for reference.
a. YAML file is used to configure the computational resources and environment for your XGBoost job to run, e.g. the number of workers/masters and the number of CPU/GPUs. Please refer to this `YAML template <https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/xgboostjob_v1alpha1_iris_train.yaml>`_ for an example.

4. Submit XGBoost job to Kubernetes cluster.
4. Submit XGBoost job to a Kubernetes cluster.

a. `Kubectl command <https://github.com/kubeflow/xgboost-operator#creating-a-xgboost-trainingprediction-job>`_ is used to submit a XGBoost job, and then you can monitor the job status.
a. Use `kubectl <https://kubernetes.io/docs/reference/kubectl/overview/>`_ to submit a distributed XGBoost job as illustrated `here <https://github.com/kubeflow/xgboost-operator#creating-a-xgboost-trainingprediction-job>`_.

****************
Work in progress
****************
*******
Support
*******

- XGBoost Model serving
- Distributed data reader/writer from/to HDFS, HBase, Hive etc.
- Model persistence on Amazon S3, Google NFS etc.
Please submit an issue on `XGBoost Operator repo <https://github.com/kubeflow/xgboost-operator>`_ for any feature requests or problems.