Skip to content

Commit

Permalink
Promote cudf as dist direct dependency, mark aggregator provided (#4043)
Browse files Browse the repository at this point in the history
Closes #3935 

- Manually emulate dependency promotion of cudf dependency by the shade plugin 
- Mark aggregator as provided
- Stop overriding default-jar execution, just bind it to none and introduce a dedicated execution for jar of parallel world directories.
- Add dependency-reduced-pom* to the clean phase.
- Undo buggy default-install execution that was not using dependency-reduced-pom

Signed-off-by: Gera Shegalov <gera@apache.org>
  • Loading branch information
gerashegalov authored Nov 12, 2021
1 parent 7d3629f commit 65c1389
Show file tree
Hide file tree
Showing 4 changed files with 70 additions and 26 deletions.
31 changes: 23 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# RAPIDS Accelerator For Apache Spark
NOTE: For the latest stable [README.md](https://github.com/nvidia/spark-rapids/blob/main/README.md) ensure you are on the main branch. The RAPIDS Accelerator for Apache Spark provides a set of plugins for Apache Spark that leverage GPUs to accelerate processing via the RAPIDS libraries and UCX. Documentation on the current release can be found [here](https://nvidia.github.io/spark-rapids/).
NOTE: For the latest stable [README.md](https://github.com/nvidia/spark-rapids/blob/main/README.md) ensure you are on the main branch. The RAPIDS Accelerator for Apache Spark provides a set of plugins for Apache Spark that leverage GPUs to accelerate processing via the RAPIDS libraries and UCX. Documentation on the current release can be found [here](https://nvidia.github.io/spark-rapids/).

The RAPIDS Accelerator for Apache Spark provides a set of plugins for
The RAPIDS Accelerator for Apache Spark provides a set of plugins for
[Apache Spark](https://spark.apache.org) that leverage GPUs to accelerate processing
via the [RAPIDS](https://rapids.ai) libraries and [UCX](https://www.openucx.org/).

Expand All @@ -19,7 +19,7 @@ To get started tuning your job and get the most performance out of it please sta

## Configuration

The plugin has a set of Spark configs that control its behavior and are documented
The plugin has a set of Spark configs that control its behavior and are documented
[here](docs/configs.md).

## Issues
Expand All @@ -30,13 +30,13 @@ may file one [here](https://github.com/NVIDIA/spark-rapids/issues/new/choose).
## Download

The jar files for the most recent release can be retrieved from the [download](docs/download.md)
page.
page.

## Building From Source

See the [build instructions in the contributing guide](CONTRIBUTING.md#building-from-source).

## Testing
## Testing

Tests are described [here](tests/README.md).

Expand All @@ -45,7 +45,7 @@ The RAPIDS Accelerator For Apache Spark does provide some APIs for doing zero co
transfer into other GPU enabled applications. It is described
[here](docs/ml-integration.md).

Currently, we are working with XGBoost to try to provide this integration out of the box.
Currently, we are working with XGBoost to try to provide this integration out of the box.

You may need to disable RMM caching when exporting data to an ML library as that library
will likely want to use all of the GPU's memory and if it is not aware of RMM it will not have
Expand All @@ -60,6 +60,21 @@ The profiling tool generates information which can be used for debugging and pro
Information such as Spark version, executor information, properties and so on. This runs on either CPU or
GPU generated event logs.

Please refer to [spark qualification tool documentation](docs/spark-qualification-tool.md)
Please refer to [spark qualification tool documentation](docs/spark-qualification-tool.md)
and [spark profiling tool documentation](docs/spark-profiling-tool.md)
for more details on how to use the tools.
for more details on how to use the tools.

## Dependency for External Projects

If you need to develop some functionality on top of RAPIDS Accelerator For Apache Spark (we currently
limit support to GPU-accelerated UDFs) we recommend you declare our distribution artifact
as a `provided` dependency.

```xml
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark_2.12</artifactId>
<version>21.12.0-SNAPSHOT</version>
<scope>provided</scope>
</dependency>
```
25 changes: 25 additions & 0 deletions aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,31 @@
<groupId>org.apache.rat</groupId>
<artifactId>apache-rat-plugin</artifactId>
</plugin>
<plugin>
<!-- keep for the case dependency-reduced pom is enabled -->
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<id>clean-reduced-dependency-poms</id>
<phase>clean</phase>
<goals>
<goal>clean</goal>
</goals>
<configuration>
<skip>${skipDrpClean}</skip>
<filesets>
<fileset>
<directory>${project.basedir}</directory>
<includes>
<include>dependency-reduced-pom*.xml</include>
</includes>
</fileset>
</filesets>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

Expand Down
36 changes: 22 additions & 14 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,23 @@
<artifactId>rapids-4-spark-aggregator_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<classifier>${spark.version.classifier}</classifier>
<!--
provided such that the 3rd party project depending on this will drop it
https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope
-->
<scope>provided</scope>
</dependency>

<!--
manually promoting provided cudf as a direct dependency
-->
<dependency>
<groupId>ai.rapids</groupId>
<artifactId>cudf</artifactId>
<version>${cudf.version}</version>
<classifier>${cuda.version}</classifier>
<scope>compile</scope>
</dependency>
</dependencies>

<properties>
Expand Down Expand Up @@ -223,7 +238,14 @@
<executions>
<execution>
<id>default-jar</id>
<phase>none</phase>
</execution>
<execution>
<id>create-parallel-worlds-jar</id>
<phase>package</phase>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<classesDirectory>${project.build.directory}/parallel-world</classesDirectory>
</configuration>
Expand Down Expand Up @@ -336,20 +358,6 @@
</excludes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-install-plugin</artifactId>
<version>3.0.0-M1</version>
<executions>
<execution>
<id>default-install</id>
<phase>install</phase>
<configuration>
<pomFile>${project.build.directory}/dependency-reduced-pom.xml</pomFile>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
4 changes: 0 additions & 4 deletions dist/scripts/binary-dedupe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -220,9 +220,5 @@ time (
echo "$((++STEP))/ deleting all class files listed in $DELETE_DUPLICATES_TXT"
time (< "$DELETE_DUPLICATES_TXT" sort -u | xargs rm) 2>&1

echo "Generating dependency-reduced-pom.xml"
# which is just delete the dependencies list altogether
sed -e '/<dependencies>/,/<\/dependencies>/d' ../pom.xml > dependency-reduced-pom.xml

end_time=$(date +%s)
echo "binary-dedupe completed in $((end_time - start_time)) seconds"

0 comments on commit 65c1389

Please sign in to comment.