[jvm-packages] update doc based on the latest changes (#10847)

dmlc · Oct 11, 2024 · 59e6c92 · 59e6c92
1 parent f7cb8e0
commit 59e6c92
Show file tree

Hide file tree

Showing 4 changed files with 164 additions and 258 deletions.
diff --git a/doc/install.rst b/doc/install.rst
@@ -159,7 +159,7 @@ R
 JVM
 ---
 
-* XGBoost4j/XGBoost4j-Spark
+* XGBoost4j-Spark
 
 .. code-block:: xml
   :caption: Maven
@@ -172,11 +172,6 @@ JVM
 
   <dependencies>
     ...
-    <dependency>
-        <groupId>ml.dmlc</groupId>
-        <artifactId>xgboost4j_${scala.binary.version}</artifactId>
-        <version>latest_version_num</version>
-    </dependency>
     <dependency>
         <groupId>ml.dmlc</groupId>
         <artifactId>xgboost4j-spark_${scala.binary.version}</artifactId>
@@ -188,11 +183,10 @@ JVM
   :caption: sbt
 
   libraryDependencies ++= Seq(
-    "ml.dmlc" %% "xgboost4j" % "latest_version_num",
     "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num"
   )
 
-* XGBoost4j-GPU/XGBoost4j-Spark-GPU
+* XGBoost4j-Spark-GPU
 
 .. code-block:: xml
   :caption: Maven
@@ -205,11 +199,6 @@ JVM
 
   <dependencies>
     ...
-    <dependency>
-        <groupId>ml.dmlc</groupId>
-        <artifactId>xgboost4j-gpu_${scala.binary.version}</artifactId>
-        <version>latest_version_num</version>
-    </dependency>
     <dependency>
         <groupId>ml.dmlc</groupId>
         <artifactId>xgboost4j-spark-gpu_${scala.binary.version}</artifactId>
@@ -221,15 +210,14 @@ JVM
   :caption: sbt
 
   libraryDependencies ++= Seq(
-    "ml.dmlc" %% "xgboost4j-gpu" % "latest_version_num",
     "ml.dmlc" %% "xgboost4j-spark-gpu" % "latest_version_num"
   )
 
 This will check out the latest stable version from the Maven Central.
 
 For the latest release version number, please check `release page <https://github.com/dmlc/xgboost/releases>`_.
 
-To enable the GPU algorithm (``device='cuda'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
+To enable the GPU algorithm (``device='cuda'``), use artifacts ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
 
 
 .. note:: Windows not supported in the JVM package
@@ -292,7 +280,7 @@ JVM
 
   resolvers += "XGBoost4J Snapshot Repo" at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/"
 
-Then add XGBoost4J as a dependency:
+Then add XGBoost4J-Spark as a dependency:
 
 .. code-block:: xml
   :caption: maven
@@ -304,12 +292,6 @@ Then add XGBoost4J as a dependency:
   </properties>
 
   <dependencies>
-    ...
-    <dependency>
-        <groupId>ml.dmlc</groupId>
-        <artifactId>xgboost4j_${scala.binary.version}</artifactId>
-        <version>latest_version_num-SNAPSHOT</version>
-    </dependency>
     <dependency>
         <groupId>ml.dmlc</groupId>
         <artifactId>xgboost4j-spark_${scala.binary.version}</artifactId>
@@ -321,11 +303,10 @@ Then add XGBoost4J as a dependency:
   :caption: sbt
 
   libraryDependencies ++= Seq(
-    "ml.dmlc" %% "xgboost4j" % "latest_version_num-SNAPSHOT",
     "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num-SNAPSHOT"
   )
 
-* XGBoost4j-GPU/XGBoost4j-Spark-GPU
+* XGBoost4j-Spark-GPU
 
 .. code-block:: xml
   :caption: maven
@@ -337,12 +318,6 @@ Then add XGBoost4J as a dependency:
   </properties>
 
   <dependencies>
-    ...
-    <dependency>
-        <groupId>ml.dmlc</groupId>
-        <artifactId>xgboost4j-gpu_${scala.binary.version}</artifactId>
-        <version>latest_version_num-SNAPSHOT</version>
-    </dependency>
     <dependency>
         <groupId>ml.dmlc</groupId>
         <artifactId>xgboost4j-spark-gpu_${scala.binary.version}</artifactId>
@@ -354,7 +329,6 @@ Then add XGBoost4J as a dependency:
   :caption: sbt
 
   libraryDependencies ++= Seq(
-    "ml.dmlc" %% "xgboost4j-gpu" % "latest_version_num-SNAPSHOT",
     "ml.dmlc" %% "xgboost4j-spark-gpu" % "latest_version_num-SNAPSHOT"
   )
 

diff --git a/doc/jvm/xgboost4j_spark_gpu_tutorial.rst b/doc/jvm/xgboost4j_spark_gpu_tutorial.rst
@@ -1,6 +1,6 @@
-#############################################
-XGBoost4J-Spark-GPU Tutorial (version 1.6.1+)
-#############################################
+############################
+XGBoost4J-Spark-GPU Tutorial
+############################
 
 **XGBoost4J-Spark-GPU** is an open source library aiming to accelerate distributed XGBoost training on Apache Spark cluster from
 end to end with GPUs by leveraging the `RAPIDS Accelerator for Apache Spark <https://nvidia.github.io/spark-rapids/>`_ product.
@@ -71,7 +71,7 @@ To make the Iris dataset recognizable to XGBoost, we need to encode the String-t
 label, i.e. "class", to the Double-typed label.
 
 One way to convert the String-typed label to Double is to use Spark's built-in feature transformer
-`StringIndexer <https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.ml.feature.StringIndexer>`_.
+`StringIndexer <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/feature/StringIndexer.html>`_.
 But this feature is not accelerated in RAPIDS Accelerator, which means it will fall back
 to CPU. Instead, we use an alternative way to achieve the same goal with the following code:
 
@@ -107,10 +107,10 @@ With window operations, we have mapped the string column of labels to label indi
 Training
 ========
 
-The GPU version of XGBoost-Spark supports both regression and classification
+XGBoost4j-Spark-Gpu supports regression, classification and ranking
 models. Although we use the Iris dataset in this tutorial to show how we use
-``XGBoost/XGBoost4J-Spark-GPU`` to resolve a multi-classes classification problem, the
-usage in Regression is very similar to classification.
+``XGBoost4J-Spark-GPU`` to resolve a multi-classes classification problem, the
+usage in Regression and Ranking is very similar to classification.
 
 To train a XGBoost model for classification, we need to define a XGBoostClassifier first:
 
@@ -168,12 +168,13 @@ model can then be used in other tasks like prediction.
 Prediction
 ==========
 
-When we get a model, either a XGBoostClassificationModel or a XGBoostRegressionModel, it takes a DataFrame as an input,
+When we get a model, a XGBoostClassificationModel or a XGBoostRegressionModel or a XGBoostRankerModel, it takes a DataFrame as an input,
 reads the column containing feature vectors, predicts for each feature vector, and outputs a new DataFrame
 with the following columns by default:
 
 * XGBoostClassificationModel will output margins (``rawPredictionCol``), probabilities(``probabilityCol``) and the eventual prediction labels (``predictionCol``) for each possible label.
 * XGBoostRegressionModel will output prediction a label(``predictionCol``).
+* XGBoostRankerModel will output prediction a label(``predictionCol``).
 
 .. code-block:: scala
 
@@ -226,25 +227,20 @@ would be ``"spark.task.resource.gpu.amount=1/spark.executor.cores"``. However, i
 using a XGBoost version earlier than 2.1.0 or a Spark standalone cluster version below 3.4.0,
 you still need to set ``"spark.task.resource.gpu.amount"`` equal to ``"spark.executor.resource.gpu.amount"``.
 
-.. note::
-
-  As of now, the stage-level scheduling feature in XGBoost is limited to the Spark standalone cluster mode.
-  However, we have plans to expand its compatibility to YARN and Kubernetes once Spark 3.5.1 is officially released.
-
-Assuming that the application main class is "Iris" and the application jar is "iris-1.0.0.jar",`
+Assuming that the application main class is "Iris" and the application jar is "iris-1.0.0.jar",
 provided below is an instance demonstrating how to submit the xgboost application to an Apache
 Spark Standalone cluster.
 
 .. code-block:: bash
 
-  rapids_version=23.10.0
-  xgboost_version=2.0.1
+  rapids_version=24.08.0
+  xgboost_version=$LATEST_VERSION
   main_class=Iris
   app_jar=iris-1.0.0.jar
 
   spark-submit \
     --master $master \
-    --packages com.nvidia:rapids-4-spark_2.12:${rapids_version},ml.dmlc:xgboost4j-gpu_2.12:${xgboost_version},ml.dmlc:xgboost4j-spark-gpu_2.12:${xgboost_version} \
+    --packages com.nvidia:rapids-4-spark_2.12:${rapids_version},ml.dmlc:xgboost4j-spark-gpu_2.12:${xgboost_version} \
     --conf spark.executor.cores=12 \
     --conf spark.task.cpus=1 \
     --conf spark.executor.resource.gpu.amount=1 \
@@ -255,7 +251,7 @@ Spark Standalone cluster.
     --class ${main_class} \
      ${app_jar}
 
-* First, we need to specify the ``RAPIDS Accelerator, xgboost4j-gpu, xgboost4j-spark-gpu`` packages by ``--packages``
+* First, we need to specify the ``RAPIDS Accelerator, xgboost4j-spark-gpu`` packages by ``--packages``
 * Second, ``RAPIDS Accelerator`` is a Spark plugin, so we need to configure it by specifying ``spark.plugins=com.nvidia.spark.SQLPlugin``
 
 For details about other ``RAPIDS Accelerator`` other configurations, please refer to the `configuration <https://nvidia.github.io/spark-rapids/docs/configs.html>`_.