Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 89 additions & 8 deletions plugins/spark/v3.5/spark/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* distributed with this work for additional debugrmation
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
Expand Down Expand Up @@ -85,12 +85,6 @@ tasks.register<ShadowJar>("createPolarisSparkJar") {
archiveClassifier = "bundle"
isZip64 = true

// include the LICENSE and NOTICE files for the shadow Jar
from(projectDir) {
include("LICENSE")
include("NOTICE")
}

// pack both the source code and dependencies
from(sourceSets.main.get().output)
configurations = listOf(project.configurations.runtimeClasspath.get())
Expand All @@ -99,9 +93,96 @@ tasks.register<ShadowJar>("createPolarisSparkJar") {
// The iceberg-spark-runtime plugin is always packaged along with our polaris-spark plugin,
// therefore excluded from the optimization.
minimize { exclude(dependency("org.apache.iceberg:iceberg-spark-runtime-*.*")) }

// Always run the license file addition after this task completes
finalizedBy("addLicenseFilesToJar")
}

// Post-processing task to add our project's LICENSE and NOTICE files to the jar and remove any
// other LICENSE or NOTICE files that were shaded in.
tasks.register("addLicenseFilesToJar") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Would this work?

  1. Run shadowJar with exclude("LICENSE")
  2. Run a simple jar task depending on shadowJar and add our own LICENSE
  3. Use output of step 3 as the published artifact.

It doubles the fat jar, but hopefully it's not too much overhead.

Copy link
Member Author

@RussellSpitzer RussellSpitzer Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope :( I tried that first

Either I don't understand Gradle or I don't understand Kotlin or some combination of both, which is probably true. Adding exclude("LICENSE") with or without wildcards would never actually exclude anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option: Custom transformer: https://gradleup.com/shadow/configuration/merging/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b i can follow this up with excluding LICENSE and NOTICE from other dependency. My previous experience with excluding is that it may not work well under some situation, I will need to dig into it. There is also a plan to just pack it as an uber jar project, i can take care of those all together later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick demonstration -

diff --git a/plugins/spark/v3.5/spark/build.gradle.kts b/plugins/spark/v3.5/spark/build.gradle.kts
index 58ac8c98..6a9defb3 100644
--- a/plugins/spark/v3.5/spark/build.gradle.kts
+++ b/plugins/spark/v3.5/spark/build.gradle.kts
@@ -89,6 +89,9 @@ tasks.register<ShadowJar>("createPolarisSparkJar") {
   from(sourceSets.main.get().output)
   configurations = listOf(project.configurations.runtimeClasspath.get())

+  exclude("LICENSE")
+  exclude("NOTICE")
+
   // Optimization: Minimize the JAR (remove unused classes from dependencies)
   // The iceberg-spark-runtime plugin is always packaged along with our polaris-spark plugin,
   // therefore excluded from the optimization.
> Task :polaris-spark-3.5_2.12:addLicenseFilesToJar
Custom actions are attached to task ':polaris-spark-3.5_2.12:addLicenseFilesToJar'.
Caching disabled for task ':polaris-spark-3.5_2.12:addLicenseFilesToJar' because:
Gradle would require more information to cache this task
Task ':polaris-spark-3.5_2.12:addLicenseFilesToJar' is not up-to-date because:
Task has not declared any outputs despite executing actions.
Processing jar: /Users/rspitzer/repos/polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
Using temp directory: /Users/rspitzer/repos/polaris/plugins/spark/v3.5/spark/build/2.12/tmp/jar-cleanup-polaris-spark-3.5_2.12-bundle
Removing license file: LICENSE-EDL-1.0.txt
Removing license file: LICENSE-EPL-1.0.txt
Removing license file: LICENSE. <------------------Not Excluded :_
Removing license file: META-INF/LICENSE
Removing license file: META-INF/NOTICE
Removing license file: META-INF/LICENSE.txt
Removing license file: NOTICE . <------------------Not Excluded :_
Added project LICENSE file
Added project NOTICE file
[ant:jar] Building jar: /Users/rspitzer/repos/polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
Recreated jar with only project LICENSE and NOTICE files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm fine with whatever works for 1.0 🤷‍♂️

... but I really hope there's a simpler solution for later :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution here is very simple :) It's just not elegant

dependsOn("createPolarisSparkJar")

doLast {
val shadowTask = tasks.named("createPolarisSparkJar", ShadowJar::class.java).get()
val jarFile = shadowTask.archiveFile.get().asFile
val tempDir =
File(
"${project.layout.buildDirectory.get().asFile}/tmp/jar-cleanup-${shadowTask.archiveBaseName.get()}-${shadowTask.archiveClassifier.get()}"
)
val projectLicenseFile = File(projectDir, "LICENSE")
val projectNoticeFile = File(projectDir, "NOTICE")

// Validate that required license files exist
if (!projectLicenseFile.exists()) {
throw GradleException("Project LICENSE file not found at: ${projectLicenseFile.absolutePath}")
}
if (!projectNoticeFile.exists()) {
throw GradleException("Project NOTICE file not found at: ${projectNoticeFile.absolutePath}")
}

logger.info("Processing jar: ${jarFile.absolutePath}")
logger.info("Using temp directory: ${tempDir.absolutePath}")

// Clean up temp directory
if (tempDir.exists()) {
tempDir.deleteRecursively()
}
tempDir.mkdirs()

// Extract the jar
copy {
from(zipTree(jarFile))
into(tempDir)
}

fileTree(tempDir)
.matching {
include("**/*LICENSE*")
include("**/*NOTICE*")
}
.forEach { file ->
logger.info("Removing license file: ${file.relativeTo(tempDir)}")
file.delete()
}

// Remove META-INF/licenses directory if it exists
val licensesDir = File(tempDir, "META-INF/licenses")
if (licensesDir.exists()) {
licensesDir.deleteRecursively()
logger.info("Removed META-INF/licenses directory")
}

// Copy our project's license files to root
copy {
from(projectLicenseFile)
into(tempDir)
}
logger.info("Added project LICENSE file")

copy {
from(projectNoticeFile)
into(tempDir)
}
logger.info("Added project NOTICE file")

// Delete the original jar
jarFile.delete()

// Create new jar with only project LICENSE and NOTICE files
ant.withGroovyBuilder {
"jar"("destfile" to jarFile.absolutePath) { "fileset"("dir" to tempDir.absolutePath) }
}

logger.info("Recreated jar with only project LICENSE and NOTICE files")

// Clean up temp directory
tempDir.deleteRecursively()
}
}

// ensure the ShadowJar job is run for both `assemble` and `build` task
// ensure the shadow jar job (which will automatically run license addition) is run for both
// `assemble` and `build` task
tasks.named("assemble") { dependsOn("createPolarisSparkJar") }

tasks.named("build") { dependsOn("createPolarisSparkJar") }
Loading