[DOC] add some ingore pattern and fix some dead links [skip ci] (#363)

* fix deadlink in tool notebook Signed-off-by: liyuan <yuali@nvidia.com> * check all links and update the stable to legacy page Signed-off-by: liyuan <yuali@nvidia.com> * fix deadlink in udf readme Signed-off-by: liyuan <yuali@nvidia.com> * update the markdown link checker version Signed-off-by: liyuan <yuali@nvidia.com> * update the markdown link checker version Signed-off-by: liyuan <yuali@nvidia.com> * update the markdown link checker version Signed-off-by: liyuan <yuali@nvidia.com> * comments the anchor links to workarround the issue Signed-off-by: liyuan <yuali@nvidia.com> * comments the anchor links to workarround the issue Signed-off-by: liyuan <yuali@nvidia.com> --------- Signed-off-by: liyuan <yuali@nvidia.com>
NVIDIA · Feb 23, 2024 · 1fe6387 · 1fe6387
1 parent 0fb15f7
commit 1fe6387
Show file tree

Hide file tree

Showing 8 changed files with 29 additions and 15 deletions.
diff --git a/.github/workflows/markdown-links-check.yml b/.github/workflows/markdown-links-check.yml
@@ -30,6 +30,5 @@ jobs:
       with:
         max-depth: -1
         use-verbose-mode: 'yes'
-        check-modified-files-only: 'yes'
         config-file: '.github/workflows/markdown-links-check/markdown-links-check-config.json'
         base-branch: 'main'
diff --git a/.github/workflows/markdown-links-check/markdown-links-check-config.json b/.github/workflows/markdown-links-check/markdown-links-check-config.json
@@ -1,4 +1,18 @@
 {
+  "ignorePatterns": [
+    {
+      "pattern": "/docs"
+    },
+    {
+      "pattern": "/datasets"
+    },
+    {
+      "pattern": "/dockerfile"
+    },
+    {
+      "pattern": "/examples"
+    }
+  ],
   "timeout": "15s",
   "retryOn429": true,
   "retryCount":30,

diff --git a/README.md b/README.md
@@ -37,7 +37,7 @@ can be built for running on GPU with RAPIDS Accelerator in this repo:
 | 3 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
 | 4 | ML/DL | PCA End-to-End | Spark MLlib based PCA example to train and transform with a synthetic dataset
 | 5 | UDF | cuSpatial - Point in Polygon | Spark cuSpatial example for Point in Polygon function using NYC Taxi pickup location dataset
-| 6 | UDF | URL Decode | Decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable/)
-| 7 | UDF | URL Encode | URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable/)
+| 6 | UDF | URL Decode | Decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/)
+| 7 | UDF | URL Encode | URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/)
 | 8 | UDF | [CosineSimilarity](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java) | Computes the cosine similarity between two float vectors using [native code](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src)
 | 9 | UDF | [StringWordCount](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)  | Implements a Hive simple UDF using [native code](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src) to count words in strings
diff --git a/examples/ML+DL-Examples/Spark-DL/criteo_train/README.md b/examples/ML+DL-Examples/Spark-DL/criteo_train/README.md
@@ -7,7 +7,7 @@ _Please note: The following demo is dedicated for DGX-2 machine(with V100 GPUs).
 ## Dataset
 
 The dataset used here is from Criteo clicklog dataset. 
-It's preprocessed by [DLRM](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM/preproc) 
+It's preprocessed by [DLRM](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM_and_DCNv2/preproc) 
 ETL job on Spark. We also provide a small size sample data in sample_data folder.
 All 40 columns(1 label + 39 features) are already numeric.
 

diff --git a/examples/ML+DL-Examples/Spark-DL/criteo_train/notebooks/Criteo-Training.ipynb b/examples/ML+DL-Examples/Spark-DL/criteo_train/notebooks/Criteo-Training.ipynb
@@ -9,7 +9,7 @@
     "\n",
     "This notebook contains the same content as \"criteo_keras.py\" but in a notebook(interactive) form.\n",
     "\n",
-    "The dataset used here is from Criteo clicklog dataset. It's preprocessed by DLRM(https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM/preproc) ETL job on Spark.\n",
+    "The dataset used here is from Criteo clicklog dataset. It's preprocessed by DLRM(https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM_and_DCNv2/preproc) ETL job on Spark.\n",
     "\n",
     "We provide a small size sample data in `sample_data` folder.\n",
     "\n",

diff --git a/examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md b/examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
@@ -18,7 +18,7 @@ which provides a single method we need to override called
 evaluateColumnar returns a cudf ColumnVector, because the GPU get its speed by performing operations
 on many rows at a time. In the `evaluateColumnar` function, there is a cudf implementation of URL
 decode that we're leveraging, so we don't need to write any native C++ code. This is all done
-through the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable). The benefit to
+through the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy). The benefit to
 implement via the Java API is ease of development, but the memory model is not friendly for doing
 GPU operations because the JVM makes the assumption that everything we're trying to do is in heap
 memory. We need to free the GPU resources in a timely manner with try-finally blocks. Note that we
@@ -27,10 +27,10 @@ involving the RAPIDS accelerated UDF falls back to the CPU.
 
 - [URLDecode](src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala)
   decodes URL-encoded strings using the
-  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 - [URLEncode](src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala)
   URL-encodes strings using the
-  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 
 ## Spark Java UDF Examples
 
@@ -53,10 +53,10 @@ significant effort.
 
 - [URLDecode](src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java)
   decodes URL-encoded strings using the
-  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 - [URLEncode](src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java)
   URL-encodes strings using the
-  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
 - [CosineSimilarity](src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java)
   computes the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)
   between two float vectors using [native code](src/main/cpp/src)
@@ -67,11 +67,11 @@ Below are some examples for implementing RAPIDS accelerated Hive UDF via JNI and
 
 - [URLDecode](src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java)
   implements a Hive simple UDF using the
-  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
   to decode URL-encoded strings
 - [URLEncode](src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java)
   implements a Hive generic UDF using the
-  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
+  [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
   to URL-encode strings
 - [StringWordCount](src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)
   implements a Hive simple UDF using

diff --git a/examples/UDF-Examples/Spark-cuSpatial/README.md b/examples/UDF-Examples/Spark-cuSpatial/README.md
@@ -82,7 +82,8 @@ Note: The docker env is just for building the jar, not for running the applicati
 
 ## Run
 ### GPU Demo on Spark Standalone on-premises cluster
-1. Set up [a standalone cluster](/docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md) of Spark. Make sure the conda/lib is included in LD_LIBRARY_PATH, so that spark executors can load libcuspatial.so.
+1. Set up [a standalone cluster](../../../docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md) of Spark. 
+   Make sure the conda/lib is included in LD_LIBRARY_PATH, so that spark executors can load libcuspatial.so.
 
 2. Download Spark RAPIDS JAR
    * [Spark RAPIDS JAR v23.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.02.0/rapids-4-spark_2.12-23.02.0.jar) or above
@@ -105,7 +106,7 @@ Note: The docker env is just for building the jar, not for running the applicati
      docker push <your-dockerhub-repo>:<your-tag>
      ```
  
-2. Follow the [Spark-rapids get-started document](https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-databricks.html#start-a-databricks-cluster) to create a GPU cluster on AWS Databricks.
+2. Follow the [Spark-rapids get-started document](https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/databricks.html) to create a GPU cluster on AWS Databricks.
  Below are some different steps since a custom docker image is used with Databricks:
     * Databricks Runtime Version
   Choose a non-ML Databricks Runtime such as `Runtime: 9.1 LTS(Scala 2.12, Spark 3.1.2)` and

diff --git a/...s/databricks/[RAPIDS Accelerator for Apache Spark] Profiling Tool Notebook Template.ipynb b/...s/databricks/[RAPIDS Accelerator for Apache Spark] Profiling Tool Notebook Template.ipynb