[Doc]Update 22.06 documentation[skip ci] #5641

viadea · 2022-05-25T18:50:50Z

Add download page for 22.06.
(Some of the features are not ready yet such as Spark 3.3 support, so I will add later once it is merged in 22.06 branch)
Address [FEA] Column reordering for columnar write utility #5460
Address [DOC] FAQ should clarify why Spark's Java and R APIs are not tested #5217
Add K8s doc to mention the base CUDA images and its dockerfile.
Modify the examples README to point to spark-rapids-examples and spark-rapids-benchmark repos.
Swap two steps in Alluxio getting-stated doc because you can not run command mount before starting the alluxio cluster.
Some other minor doc update

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

abellina · 2022-05-25T20:37:18Z

@viadea could we add a FAQ entry to say the ASYNC allocator is on by default but for CUDA 11.4 and older drivers we will fallback to ARENA.

docs/FAQ.md

jlowe · 2022-05-25T20:43:47Z

docs/download.md

+
+### Download v22.06.0
+* Download the [RAPIDS
+  Accelerator for Apache Spark 22.06.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.06.0/rapids-4-spark_2.12-22.06.0.jar)


Because of this currently bad link, I'd like to see this checked in as late as possible. Otherwise we end up with every PR in the meantime being flagged for a bad link because it's checked in that way.

Yes we can wait for some time to merge this PR.
My plan is to merge this PR before the merge request to main, so that future gh-pages update PR can take it from there.

docs/download.md

jlowe · 2022-05-25T20:46:19Z

docs/download.md

+This package is built against CUDA 11.5 and has [CUDA forward
+compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) enabled.  It is tested
+on V100, T4, A2, A10, A30 and A100 GPUs with CUDA 11.0-11.5.  For those using other types of GPUs which
+do not have CUDA forward compatibility (for example, GeForce), CUDA 11.5 is required. Users will


Should say "CUDA 11.5 or later is required" here, as CUDA backward compatibility will allow us to run on CUDA versions > 11.5.

docs/download.md

docs/get-started/getting-started-kubernetes.md

docs/tuning-guide.md

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

Signed-off-by: Hao Zhu <hazhu@nvdia.com>

viadea · 2022-05-25T21:17:47Z

@viadea could we add a FAQ entry to say the ASYNC allocator is on by default but for CUDA 11.4 and older drivers we will fallback to ARENA.

Added. How about now?

docs/download.md

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

docs/FAQ.md

docs/download.md

sameerz · 2022-05-26T23:13:59Z

docs/download.md

+* Enable regular expression by default
+* Enable some float related configurations by default


Enabling CSV reads, regular expressions, and floating point operations by default ought to be higher on the list of new features. spark.sql.mapKeyDedupPolicy=LAST_WIN is probably not that important to highlight. Rather, we can highlight features like: Improved ANSI support, Supporting for Avro reading of primitive types,

Refactored the release notes.

BTW: for "Avro reading of primitive types" it was added for 22.04 before.

Got it, thanks.

sameerz · 2022-05-26T23:17:29Z

docs/tuning-guide.md

+
+We suggest reordering the columns needed by the queries and then rewrite the files to make those
+columns adjacent. This could help both Spark on CPU and GPU.
+


Should we add a comment here about using spark.rapids.sql.format.parquet.reader.footer.type=NATIVE if there are a large number of columns and the data format is Parquet?

The feature is experimental. Not sure we're ready to widely advertise it yet, but I'd defer to @revans2 on this.

Fair enough, we can add the note about it in the tuning guide after it is no longer experimental.

Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

sameerz · 2022-05-27T20:53:41Z

Do we need to update other parts of the documentation where we refer to the cudf jar, such as:

getting-started-on-prem.md, which has download instructions for cudf and scripts which reference the cudf jar location
generate-init-script.ipynb, which downloads and sets environment variables for the cudf jar
getting-started-gcp.md , which mentions "The notebook depends on the pre-compiled Spark RAPIDS SQL plugin and cuDF, which are pre-downloaded by the GCP Dataproc RAPIDS init script."
getting-started-kubernetes.md, which has download instructions for cudf and scripts which reference the cudf jar location
getting-started-workload-qualification, which has instructions on downloading and using the cudf jar
spark-profiling-tool.md, which mentions the cudf jar as a depdendancy
additional-functionality/rapids-shuffle.md, which includes the cudf jar in the classpath
dev/nvtx_profiling.md might need updating, as it mentions compiling the cudf jar separately

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

viadea · 2022-05-31T17:55:30Z

Do we need to update other parts of the documentation where we refer to the cudf jar, such as:

* getting-started-on-prem.md, which has download instructions for cudf and scripts which reference the cudf jar location

* generate-init-script.ipynb, which downloads and sets environment variables for the cudf jar

* getting-started-gcp.md , which mentions "The notebook depends on the pre-compiled [Spark RAPIDS SQL plugin](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark) and [cuDF](https://mvnrepository.com/artifact/ai.rapids/cudf), which are pre-downloaded by the GCP Dataproc [RAPIDS init script](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids)."

* getting-started-kubernetes.md, which has download instructions for cudf and scripts which reference the cudf jar location

* getting-started-workload-qualification, which has instructions on downloading and using the cudf jar

* spark-profiling-tool.md, which mentions the cudf jar as a depdendancy

* additional-functionality/rapids-shuffle.md, which includes the cudf jar in the classpath

* dev/nvtx_profiling.md might need updating, as it mentions compiling the cudf jar separately

@sameerz

I think most of the above were already handled by previous PRs in 22.06 branch.
But I did some update/fix in the latest commit:

getting-started-gcp.md -- Fixed
getting-started-databricks.md -- removed "with RAPIDS and cuDF" words.
additional-functionality/rapids-shuffle.md -- Fixed.
dev/nvtx_profiling.md -- Since we no longer need to build the cuDF jar ourselves so i just removed the whole step 1.

Regarding "spark-profiling-tool.md", my thought is our profiling tool still needs to print cuDF jar related information based on what version of RAPIDS+CUDF the Spark eventlog was based on. So I keep the example output with cuDF jar info there.

sameerz · 2022-06-01T06:20:43Z

docs/FAQ.md

@@ -307,11 +307,15 @@ Yes

 ### Are the R APIs for Spark supported?

-Yes, but we don't actively test them.
+Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 


Suggestion for this text and the Java API text below.

Suggested change

Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at

Yes, but we don't actively test them, because the RAPIDS Accelerator hooks into Spark not at

Suggested change

Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at

Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at

Changed both.

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

sameerz · 2022-06-01T06:20:43Z

docs/FAQ.md

@@ -307,11 +307,15 @@ Yes

 ### Are the R APIs for Spark supported?

-Yes, but we don't actively test them.
+Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at 


Suggestion for this text and the Java API text below.

Suggested change

Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at

Yes, but we don't actively test them, because the RAPIDS Accelerator hooks into Spark not at

Suggested change

Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at

Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at

sameerz · 2022-06-02T16:52:20Z

build

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

viadea · 2022-06-02T19:37:05Z

Fixed a parameter typo in docs/additional-functionality/rapids-udfs.md:
spark.rapids.python.gpu.enabled -> spark.rapids.sql.python.gpu.enabled

@sameerz would u mind re-approving?

tgravescs · 2022-06-03T15:24:05Z

build

tgravescs · 2022-06-03T15:49:01Z

build

viadea added 2 commits May 25, 2022 11:43

Add 2206 doc changes

d4a4384

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

Correct download page

7ea58f5

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

viadea added the documentation Improvements or additions to documentation label May 25, 2022

viadea requested review from jlowe, revans2, sameerz and nvliyuan May 25, 2022 18:50

Doc the ASYNC allocator

e2da869

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

viadea requested a review from abellina May 25, 2022 19:09

typo

5560575

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

jlowe reviewed May 25, 2022

View reviewed changes

viadea and others added 4 commits May 25, 2022 14:03

Add RMM FAQ

b632315

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

Update docs/tuning-guide.md

3490071

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

Update docs/get-started/getting-started-kubernetes.md

660a19e

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

Reword

12326f7

Signed-off-by: Hao Zhu <hazhu@nvdia.com>

jlowe previously approved these changes May 25, 2022

View reviewed changes

docs/download.md Outdated Show resolved Hide resolved

Modify examples page

c8156c4

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

viadea dismissed jlowe’s stale review via c8156c4 May 25, 2022 21:36

viadea and others added 3 commits May 25, 2022 14:37

Update docs/download.md

8f01de8

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

Add a section for spark-rapids-benchmarks

7dd83f0

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

Swap one step in alluxio doc

45dbb41

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

sameerz reviewed May 26, 2022

View reviewed changes

viadea and others added 6 commits May 27, 2022 10:24

Update docs/FAQ.md

e0275a1

Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>

Update docs/download.md

6f44f88

Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>

Update docs/download.md

e10accb

Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>

Update docs/download.md

49a614f

Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>

Update docs/download.md

738bf37

Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>

Reword release notes

b141646

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

reword

be496cd

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

Remove cudf jar from more docs/notebooks

eda4762

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

sameerz reviewed Jun 1, 2022

View reviewed changes

reword for FAQ

966c23d

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

sameerz previously approved these changes Jun 2, 2022

View reviewed changes

Fix typo for spark.rapids.sql.python.gpu.enabled

aa1fb18

Signed-off-by: Hao Zhu <hazhu@nvidia.com>

viadea dismissed sameerz’s stale review via aa1fb18 June 2, 2022 19:35

sameerz approved these changes Jun 2, 2022

View reviewed changes

tgravescs approved these changes Jun 3, 2022

View reviewed changes

viadea merged commit ee638d5 into NVIDIA:branch-22.06 Jun 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc]Update 22.06 documentation[skip ci] #5641

[Doc]Update 22.06 documentation[skip ci] #5641

viadea commented May 25, 2022 •

edited by sameerz

Loading

abellina commented May 25, 2022

jlowe May 25, 2022

viadea May 25, 2022

jlowe May 25, 2022

viadea May 25, 2022

viadea commented May 25, 2022

sameerz May 26, 2022

viadea May 27, 2022

sameerz May 27, 2022

sameerz May 26, 2022

jlowe May 27, 2022

sameerz May 27, 2022

sameerz commented May 27, 2022

viadea commented May 31, 2022

sameerz Jun 1, 2022

viadea Jun 1, 2022

sameerz Jun 1, 2022

sameerz commented Jun 2, 2022

viadea commented Jun 2, 2022

tgravescs commented Jun 3, 2022

tgravescs commented Jun 3, 2022

		* Enable regular expression by default
		* Enable some float related configurations by default


		We suggest reordering the columns needed by the queries and then rewrite the files to make those
		columns adjacent. This could help both Spark on CPU and GPU.

	Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at
	Yes, but we don't actively test them, because the RAPIDS Accelerator hooks into Spark not at

[Doc]Update 22.06 documentation[skip ci] #5641

[Doc]Update 22.06 documentation[skip ci] #5641

Conversation

viadea commented May 25, 2022 • edited by sameerz Loading

abellina commented May 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viadea commented May 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sameerz commented May 27, 2022

viadea commented May 31, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sameerz commented Jun 2, 2022

viadea commented Jun 2, 2022

tgravescs commented Jun 3, 2022

tgravescs commented Jun 3, 2022

viadea commented May 25, 2022 •

edited by sameerz

Loading