-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc]Update 22.06 documentation[skip ci] #5641
Conversation
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
@viadea could we add a FAQ entry to say the ASYNC allocator is on by default but for CUDA 11.4 and older drivers we will fallback to ARENA. |
|
||
### Download v22.06.0 | ||
* Download the [RAPIDS | ||
Accelerator for Apache Spark 22.06.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.06.0/rapids-4-spark_2.12-22.06.0.jar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of this currently bad link, I'd like to see this checked in as late as possible. Otherwise we end up with every PR in the meantime being flagged for a bad link because it's checked in that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can wait for some time to merge this PR.
My plan is to merge this PR before the merge request to main, so that future gh-pages update PR can take it from there.
docs/download.md
Outdated
This package is built against CUDA 11.5 and has [CUDA forward | ||
compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) enabled. It is tested | ||
on V100, T4, A2, A10, A30 and A100 GPUs with CUDA 11.0-11.5. For those using other types of GPUs which | ||
do not have CUDA forward compatibility (for example, GeForce), CUDA 11.5 is required. Users will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should say "CUDA 11.5 or later is required" here, as CUDA backward compatibility will allow us to run on CUDA versions > 11.5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Added. How about now? |
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
* Enable regular expression by default | ||
* Enable some float related configurations by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling CSV reads, regular expressions, and floating point operations by default ought to be higher on the list of new features. spark.sql.mapKeyDedupPolicy=LAST_WIN is probably not that important to highlight. Rather, we can highlight features like: Improved ANSI support, Supporting for Avro reading of primitive types,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the release notes.
BTW: for "Avro reading of primitive types" it was added for 22.04 before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
|
||
We suggest reordering the columns needed by the queries and then rewrite the files to make those | ||
columns adjacent. This could help both Spark on CPU and GPU. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a comment here about using spark.rapids.sql.format.parquet.reader.footer.type=NATIVE
if there are a large number of columns and the data format is Parquet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature is experimental. Not sure we're ready to widely advertise it yet, but I'd defer to @revans2 on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, we can add the note about it in the tuning guide after it is no longer experimental.
Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>
Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>
Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>
Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>
Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
Do we need to update other parts of the documentation where we refer to the cudf jar, such as:
|
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
I think most of the above were already handled by previous PRs in 22.06 branch.
Regarding "spark-profiling-tool.md", my thought is our profiling tool still needs to print cuDF jar related information based on what version of RAPIDS+CUDF the Spark eventlog was based on. So I keep the example output with cuDF jar info there. |
docs/FAQ.md
Outdated
@@ -307,11 +307,15 @@ Yes | |||
|
|||
### Are the R APIs for Spark supported? | |||
|
|||
Yes, but we don't actively test them. | |||
Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for this text and the Java API text below.
Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at | |
Yes, but we don't actively test them, because the RAPIDS Accelerator hooks into Spark not at |
Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at | |
Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed both.
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
docs/FAQ.md
Outdated
@@ -307,11 +307,15 @@ Yes | |||
|
|||
### Are the R APIs for Spark supported? | |||
|
|||
Yes, but we don't actively test them. | |||
Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for this text and the Java API text below.
Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at | |
Yes, but we don't actively test them, because the RAPIDS Accelerator hooks into Spark not at |
Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at | |
Yes, but we don't actively test them. It is because the RAPIDS Accelerator hooks into Spark not at |
build |
Signed-off-by: Hao Zhu <hazhu@nvidia.com>
Fixed a parameter typo in docs/additional-functionality/rapids-udfs.md: @sameerz would u mind re-approving? |
build |
1 similar comment
build |
Fixes #5217
Add download page for 22.06.
(Some of the features are not ready yet such as Spark 3.3 support, so I will add later once it is merged in 22.06 branch)
Address [FEA] Column reordering for columnar write utility #5460
Address [DOC] FAQ should clarify why Spark's Java and R APIs are not tested #5217
Add K8s doc to mention the base CUDA images and its dockerfile.
Modify the examples README to point to spark-rapids-examples and spark-rapids-benchmark repos.
Swap two steps in Alluxio getting-stated doc because you can not run command
mount
before starting the alluxio cluster.Some other minor doc update