Skip to content

Commit

Permalink
[DOC] add some ingore pattern and fix some dead links [skip ci] (#363)
Browse files Browse the repository at this point in the history
* fix deadlink in tool notebook

Signed-off-by: liyuan <yuali@nvidia.com>

* check all links and update the stable to legacy page

Signed-off-by: liyuan <yuali@nvidia.com>

* fix deadlink in udf readme

Signed-off-by: liyuan <yuali@nvidia.com>

* update the markdown link checker version

Signed-off-by: liyuan <yuali@nvidia.com>

* update the markdown link checker version

Signed-off-by: liyuan <yuali@nvidia.com>

* update the markdown link checker version

Signed-off-by: liyuan <yuali@nvidia.com>

* comments the anchor links to workarround the issue

Signed-off-by: liyuan <yuali@nvidia.com>

* comments the anchor links to workarround the issue

Signed-off-by: liyuan <yuali@nvidia.com>

---------

Signed-off-by: liyuan <yuali@nvidia.com>
  • Loading branch information
nvliyuan authored Feb 23, 2024
1 parent 0fb15f7 commit 1fe6387
Show file tree
Hide file tree
Showing 8 changed files with 29 additions and 15 deletions.
1 change: 0 additions & 1 deletion .github/workflows/markdown-links-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,5 @@ jobs:
with:
max-depth: -1
use-verbose-mode: 'yes'
check-modified-files-only: 'yes'
config-file: '.github/workflows/markdown-links-check/markdown-links-check-config.json'
base-branch: 'main'
Original file line number Diff line number Diff line change
@@ -1,4 +1,18 @@
{
"ignorePatterns": [
{
"pattern": "/docs"
},
{
"pattern": "/datasets"
},
{
"pattern": "/dockerfile"
},
{
"pattern": "/examples"
}
],
"timeout": "15s",
"retryOn429": true,
"retryCount":30,
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ can be built for running on GPU with RAPIDS Accelerator in this repo:
| 3 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
| 4 | ML/DL | PCA End-to-End | Spark MLlib based PCA example to train and transform with a synthetic dataset
| 5 | UDF | cuSpatial - Point in Polygon | Spark cuSpatial example for Point in Polygon function using NYC Taxi pickup location dataset
| 6 | UDF | URL Decode | Decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable/)
| 7 | UDF | URL Encode | URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable/)
| 6 | UDF | URL Decode | Decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/)
| 7 | UDF | URL Encode | URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/)
| 8 | UDF | [CosineSimilarity](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java) | Computes the cosine similarity between two float vectors using [native code](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src)
| 9 | UDF | [StringWordCount](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java) | Implements a Hive simple UDF using [native code](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src) to count words in strings
2 changes: 1 addition & 1 deletion examples/ML+DL-Examples/Spark-DL/criteo_train/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ _Please note: The following demo is dedicated for DGX-2 machine(with V100 GPUs).
## Dataset

The dataset used here is from Criteo clicklog dataset.
It's preprocessed by [DLRM](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM/preproc)
It's preprocessed by [DLRM](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM_and_DCNv2/preproc)
ETL job on Spark. We also provide a small size sample data in sample_data folder.
All 40 columns(1 label + 39 features) are already numeric.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"This notebook contains the same content as \"criteo_keras.py\" but in a notebook(interactive) form.\n",
"\n",
"The dataset used here is from Criteo clicklog dataset. It's preprocessed by DLRM(https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM/preproc) ETL job on Spark.\n",
"The dataset used here is from Criteo clicklog dataset. It's preprocessed by DLRM(https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM_and_DCNv2/preproc) ETL job on Spark.\n",
"\n",
"We provide a small size sample data in `sample_data` folder.\n",
"\n",
Expand Down
14 changes: 7 additions & 7 deletions examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ which provides a single method we need to override called
evaluateColumnar returns a cudf ColumnVector, because the GPU get its speed by performing operations
on many rows at a time. In the `evaluateColumnar` function, there is a cudf implementation of URL
decode that we're leveraging, so we don't need to write any native C++ code. This is all done
through the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable). The benefit to
through the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy). The benefit to
implement via the Java API is ease of development, but the memory model is not friendly for doing
GPU operations because the JVM makes the assumption that everything we're trying to do is in heap
memory. We need to free the GPU resources in a timely manner with try-finally blocks. Note that we
Expand All @@ -27,10 +27,10 @@ involving the RAPIDS accelerated UDF falls back to the CPU.

- [URLDecode](src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala)
decodes URL-encoded strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
- [URLEncode](src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala)
URL-encodes strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)

## Spark Java UDF Examples

Expand All @@ -53,10 +53,10 @@ significant effort.

- [URLDecode](src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java)
decodes URL-encoded strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
- [URLEncode](src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java)
URL-encodes strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
- [CosineSimilarity](src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java)
computes the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)
between two float vectors using [native code](src/main/cpp/src)
Expand All @@ -67,11 +67,11 @@ Below are some examples for implementing RAPIDS accelerated Hive UDF via JNI and

- [URLDecode](src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java)
implements a Hive simple UDF using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
to decode URL-encoded strings
- [URLEncode](src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java)
implements a Hive generic UDF using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
to URL-encode strings
- [StringWordCount](src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)
implements a Hive simple UDF using
Expand Down
5 changes: 3 additions & 2 deletions examples/UDF-Examples/Spark-cuSpatial/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,8 @@ Note: The docker env is just for building the jar, not for running the applicati
## Run
### GPU Demo on Spark Standalone on-premises cluster
1. Set up [a standalone cluster](/docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md) of Spark. Make sure the conda/lib is included in LD_LIBRARY_PATH, so that spark executors can load libcuspatial.so.
1. Set up [a standalone cluster](../../../docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md) of Spark.
Make sure the conda/lib is included in LD_LIBRARY_PATH, so that spark executors can load libcuspatial.so.
2. Download Spark RAPIDS JAR
* [Spark RAPIDS JAR v23.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.02.0/rapids-4-spark_2.12-23.02.0.jar) or above
Expand All @@ -105,7 +106,7 @@ Note: The docker env is just for building the jar, not for running the applicati
docker push <your-dockerhub-repo>:<your-tag>
```
2. Follow the [Spark-rapids get-started document](https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-databricks.html#start-a-databricks-cluster) to create a GPU cluster on AWS Databricks.
2. Follow the [Spark-rapids get-started document](https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/databricks.html) to create a GPU cluster on AWS Databricks.
Below are some different steps since a custom docker image is used with Databricks:
* Databricks Runtime Version
Choose a non-ML Databricks Runtime such as `Runtime: 9.1 LTS(Scala 2.12, Spark 3.1.2)` and
Expand Down

Large diffs are not rendered by default.

0 comments on commit 1fe6387

Please sign in to comment.