Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote cudf as dist direct dependency, mark aggregator provided #4043

Merged
merged 11 commits into from
Nov 12, 2021
31 changes: 23 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# RAPIDS Accelerator For Apache Spark
NOTE: For the latest stable [README.md](https://github.com/nvidia/spark-rapids/blob/main/README.md) ensure you are on the main branch. The RAPIDS Accelerator for Apache Spark provides a set of plugins for Apache Spark that leverage GPUs to accelerate processing via the RAPIDS libraries and UCX. Documentation on the current release can be found [here](https://nvidia.github.io/spark-rapids/).
NOTE: For the latest stable [README.md](https://github.com/nvidia/spark-rapids/blob/main/README.md) ensure you are on the main branch. The RAPIDS Accelerator for Apache Spark provides a set of plugins for Apache Spark that leverage GPUs to accelerate processing via the RAPIDS libraries and UCX. Documentation on the current release can be found [here](https://nvidia.github.io/spark-rapids/).

The RAPIDS Accelerator for Apache Spark provides a set of plugins for
The RAPIDS Accelerator for Apache Spark provides a set of plugins for
[Apache Spark](https://spark.apache.org) that leverage GPUs to accelerate processing
via the [RAPIDS](https://rapids.ai) libraries and [UCX](https://www.openucx.org/).

Expand All @@ -19,7 +19,7 @@ To get started tuning your job and get the most performance out of it please sta

## Configuration

The plugin has a set of Spark configs that control its behavior and are documented
The plugin has a set of Spark configs that control its behavior and are documented
[here](docs/configs.md).

## Issues
Expand All @@ -30,13 +30,13 @@ may file one [here](https://github.com/NVIDIA/spark-rapids/issues/new/choose).
## Download

The jar files for the most recent release can be retrieved from the [download](docs/download.md)
page.
page.

## Building From Source

See the [build instructions in the contributing guide](CONTRIBUTING.md#building-from-source).

## Testing
## Testing

Tests are described [here](tests/README.md).

Expand All @@ -45,7 +45,7 @@ The RAPIDS Accelerator For Apache Spark does provide some APIs for doing zero co
transfer into other GPU enabled applications. It is described
[here](docs/ml-integration.md).

Currently, we are working with XGBoost to try to provide this integration out of the box.
Currently, we are working with XGBoost to try to provide this integration out of the box.

You may need to disable RMM caching when exporting data to an ML library as that library
will likely want to use all of the GPU's memory and if it is not aware of RMM it will not have
Expand All @@ -60,6 +60,21 @@ The profiling tool generates information which can be used for debugging and pro
Information such as Spark version, executor information, properties and so on. This runs on either CPU or
GPU generated event logs.

Please refer to [spark qualification tool documentation](docs/spark-qualification-tool.md)
Please refer to [spark qualification tool documentation](docs/spark-qualification-tool.md)
and [spark profiling tool documentation](docs/spark-profiling-tool.md)
for more details on how to use the tools.
for more details on how to use the tools.

## Dependency for External Projects

If you need to develop some functionality on top of RAPIDS Accelerator For Apache Spark (we currently
limit support to GPU-accelerated UDFs) we recommend you declare our distribution artifact
as a `provided` dependency.
jlowe marked this conversation as resolved.
Show resolved Hide resolved

```xml
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark_2.12</artifactId>
<version>21.12.0-SNAPSHOT</version>
<scope>provided</scope>
</dependency>
```
25 changes: 25 additions & 0 deletions aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,31 @@
<groupId>org.apache.rat</groupId>
<artifactId>apache-rat-plugin</artifactId>
</plugin>
<plugin>
<!-- keep for the case dependency-reduced pom is enabled -->
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<id>clean-reduced-dependency-poms</id>
<phase>clean</phase>
<goals>
<goal>clean</goal>
</goals>
<configuration>
<skip>${skipDrpClean}</skip>
jlowe marked this conversation as resolved.
Show resolved Hide resolved
<filesets>
<fileset>
<directory>${project.basedir}</directory>
<includes>
<include>dependency-reduced-pom*.xml</include>
</includes>
</fileset>
</filesets>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

Expand Down
36 changes: 22 additions & 14 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,23 @@
<artifactId>rapids-4-spark-aggregator_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<classifier>${spark.version.classifier}</classifier>
<!--
provided such that the 3rd party project depending on this will drop it
https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope
-->
<scope>provided</scope>
</dependency>

<!--
manually promoting provided cudf as a direct dependency
-->
<dependency>
<groupId>ai.rapids</groupId>
<artifactId>cudf</artifactId>
<version>${cudf.version}</version>
<classifier>${cuda.version}</classifier>
<scope>compile</scope>
</dependency>
</dependencies>

<properties>
Expand Down Expand Up @@ -223,7 +238,14 @@
<executions>
<execution>
<id>default-jar</id>
<phase>none</phase>
</execution>
<execution>
<id>create-parallel-worlds-jar</id>
<phase>package</phase>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<classesDirectory>${project.build.directory}/parallel-world</classesDirectory>
</configuration>
Expand Down Expand Up @@ -336,20 +358,6 @@
</excludes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-install-plugin</artifactId>
<version>3.0.0-M1</version>
<executions>
<execution>
<id>default-install</id>
<phase>install</phase>
<configuration>
<pomFile>${project.build.directory}/dependency-reduced-pom.xml</pomFile>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
4 changes: 0 additions & 4 deletions dist/scripts/binary-dedupe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -220,9 +220,5 @@ time (
echo "$((++STEP))/ deleting all class files listed in $DELETE_DUPLICATES_TXT"
time (< "$DELETE_DUPLICATES_TXT" sort -u | xargs rm) 2>&1

echo "Generating dependency-reduced-pom.xml"
# which is just delete the dependencies list altogether
sed -e '/<dependencies>/,/<\/dependencies>/d' ../pom.xml > dependency-reduced-pom.xml

end_time=$(date +%s)
echo "binary-dedupe completed in $((end_time - start_time)) seconds"