Skip to content

Conversation

@cnauroth
Copy link
Contributor

Description of PR

HADOOP-19343: Manage hadoop-gcp Guava version directly in its pom.xml and mark exclusion in hadoop-tools-dist.

How was this patch tested?

  1. Ran full distro: mvn -Pdist -Dtar -DskipTests clean package. Confirmed no dependency convergence errors.
  2. Unpacked the distro and manually executed several hadoop fs commands against a GCS bucket.
  3. Ran mvn clean verify in hadoop-gcp to confirm all integration tests pass.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@cnauroth
Copy link
Contributor Author

Hello @slfan1989 and @pan3793 . This is a reattempt of #7883, which I had to revert. The key difference here is the exclusion in hadoop-tools-dist/pom.xml, so that when the distro module is built, there are no dependency convergence errors.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ HADOOP-19343 Compile Tests _
+0 🆗 mvndep 12m 3s Maven dependency ordering for branch
-1 ❌ mvninstall 46m 33s /branch-mvninstall-root.txt root in HADOOP-19343 failed.
+1 💚 compile 18m 11s HADOOP-19343 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 15m 20s HADOOP-19343 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 mvnsite 2m 3s HADOOP-19343 passed
+1 💚 javadoc 2m 1s HADOOP-19343 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 44s HADOOP-19343 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 shadedclient 140m 17s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 39s Maven dependency ordering for patch
+1 💚 mvninstall 13m 6s the patch passed
+1 💚 compile 17m 19s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 17m 19s the patch passed
+1 💚 compile 15m 34s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 15m 34s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 mvnsite 1m 54s the patch passed
+1 💚 javadoc 1m 55s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 54s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 shadedclient 49m 29s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 36s hadoop-project in the patch passed.
+1 💚 unit 0m 47s hadoop-gcp in the patch passed.
+1 💚 unit 0m 49s hadoop-tools-dist in the patch passed.
+1 💚 asflicense 1m 6s The patch does not generate ASF License warnings.
242m 50s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7904/1/artifact/out/Dockerfile
GITHUB PR #7904
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint
uname Linux 31a5fafa3e60 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision HADOOP-19343 / ffbf3f9
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7904/1/testReport/
Max. process+thread count 530 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-tools/hadoop-gcp hadoop-tools/hadoop-tools-dist U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7904/1/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@slfan1989
Copy link
Contributor

Hello @slfan1989 and @pan3793 . This is a reattempt of #7883, which I had to revert. The key difference here is the exclusion in hadoop-tools-dist/pom.xml, so that when the distro module is built, there are no dependency convergence errors.

@cnauroth I checked the build logs, and there are still some dependency issues that need to be resolved.

[ERROR] Dependency convergence error for org.checkerframework:checker-qual:jar:3.42.0 paths to dependency are:
[ERROR] +-org.apache.hadoop:hadoop-tools-dist:jar:3.5.0-SNAPSHOT
[ERROR]   +-org.apache.hadoop:hadoop-kafka:jar:3.5.0-SNAPSHOT:compile
[ERROR]     +-org.apache.hadoop:hadoop-common:jar:3.5.0-SNAPSHOT:compile
[ERROR]       +-com.google.guava:guava:jar:33.1.0-jre:compile
[ERROR]         +-org.checkerframework:checker-qual:jar:3.42.0:compile
[ERROR] and
[ERROR] +-org.apache.hadoop:hadoop-tools-dist:jar:3.5.0-SNAPSHOT
[ERROR]   +-org.apache.hadoop:hadoop-gcp:jar:3.5.0-SNAPSHOT:compile
[ERROR]     +-com.google.cloud:google-cloud-storage:jar:2.52.0:compile
[ERROR]       +-org.checkerframework:checker-qual:jar:3.49.0:compile
[ERROR] 
[ERROR] 
[ERROR] Dependency convergence error for io.opentelemetry:opentelemetry-api:jar:1.38.0 paths to dependency are:
[ERROR] +-org.apache.hadoop:hadoop-tools-dist:jar:3.5.0-SNAPSHOT
[ERROR]   +-org.apache.hadoop:hadoop-aliyun:jar:3.5.0-SNAPSHOT:compile
[ERROR]     +-com.aliyun.oss:aliyun-sdk-oss:jar:3.18.1:compile
[ERROR]       +-com.aliyun:java-trace-api:jar:0.2.11-beta:compile
[ERROR]         +-io.opentelemetry:opentelemetry-api:jar:1.38.0:compile
[ERROR] and
[ERROR] +-org.apache.hadoop:hadoop-tools-dist:jar:3.5.0-SNAPSHOT
[ERROR]   +-org.apache.hadoop:hadoop-gcp:jar:3.5.0-SNAPSHOT:compile
[ERROR]     +-com.google.cloud:google-cloud-storage:jar:2.52.0:compile
[ERROR]       +-io.opentelemetry:opentelemetry-api:jar:1.47.0:compile

@pan3793
Copy link
Member

pan3793 commented Aug 27, 2025

I lean towards unifying the Guava version in the whole project, supposing the assumption is true.
https://github.com/apache/hadoop/pull/7883/files#r2284125866

@slfan1989
Copy link
Contributor

I lean towards unifying the Guava version in the whole project, supposing the assumption is true.
https://github.com/apache/hadoop/pull/7883/files#r2284125866

Additional information

In hadoop-third-party we have already shaded guava, so I’m not sure whether it is still necessary to keep guava in the hadoop project. We may need Steve’s input on this.

@steveloughran @pjfanning

@pan3793
Copy link
Member

pan3793 commented Aug 27, 2025

@slfan1989 AFAIK, third-party libs, e.g. Curator, require vanilla Guava classes in runtime.

@cnauroth
Copy link
Contributor Author

Hi @slfan1989 . @pan3793 is correct. All of the hadoop-gcp code itself is using the hadoop-thirdparty shaded Guava:

https://github.com/apache/hadoop/blob/HADOOP-19343/hadoop-tools/hadoop-gcp/src/main/java/org/apache/hadoop/fs/gs/GoogleCloudStorageFileSystem.java#L21

The specific issue here is about the Guava used by the GCS SDK. We don't have a way to rewire its imports to point at our third-party. Also, we've seen historically that the GCS SDK can be particular about the version of Guava it uses. Historically, the original codebase from the Google repo has shaded this to guarantee a matching version and also isolate Hadoop clients from unexpected changes in the Guava version.

@slfan1989
Copy link
Contributor

Hi @slfan1989 . @pan3793 is correct. All of the hadoop-gcp code itself is using the hadoop-thirdparty shaded Guava:

https://github.com/apache/hadoop/blob/HADOOP-19343/hadoop-tools/hadoop-gcp/src/main/java/org/apache/hadoop/fs/gs/GoogleCloudStorageFileSystem.java#L21

The specific issue here is about the Guava used by the GCS SDK. We don't have a way to rewire its imports to point at our third-party. Also, we've seen historically that the GCS SDK can be particular about the version of Guava it uses. Historically, the original codebase from the Google repo has shaded this to guarantee a matching version and also isolate Hadoop clients from unexpected changes in the Guava version.

@cnauroth Thank you for the clarification! I have no further questions about this PR. I'm curious why the Hadoop main project hasn't upgraded the Guava version. It seems 27.0-jre is from 2018.

@cnauroth cnauroth force-pushed the HADOOP-19343-guava-version-2 branch from ffbf3f9 to 93c3d2a Compare August 28, 2025 16:22
@cnauroth
Copy link
Contributor Author

@slfan1989 , thanks for the review!

The reason for the last Yetus failure was that I didn't have the right match for the Guava version number used by the GCS SDK. I pushed up an update. I also added more comments to explain what's going on for future maintainers. I'll wait for a clean Yetus run before committing.

Regarding the old Guava version, I honestly don't remember why it's there, considering we now have hadoop-thirdparty. Maybe it was an assumption that old client projects had come to rely on it as a transitive dependency? There has been a policy that exposed dependencies are treated as public/stable:

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#Java_Classpath

Maybe 3.5.0 is an acceptable version boundary to remove this. I've been planning to start a separate discussion.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 21m 17s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ HADOOP-19343 Compile Tests _
+0 🆗 mvndep 11m 47s Maven dependency ordering for branch
-1 ❌ mvninstall 45m 23s /branch-mvninstall-root.txt root in HADOOP-19343 failed.
+1 💚 compile 18m 22s HADOOP-19343 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 15m 27s HADOOP-19343 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 mvnsite 2m 2s HADOOP-19343 passed
+1 💚 javadoc 2m 2s HADOOP-19343 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 52s HADOOP-19343 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 shadedclient 136m 36s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 40s Maven dependency ordering for patch
+1 💚 mvninstall 13m 15s the patch passed
+1 💚 compile 16m 58s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 16m 58s the patch passed
+1 💚 compile 15m 36s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 15m 36s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 mvnsite 2m 1s the patch passed
+1 💚 javadoc 1m 55s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 52s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 shadedclient 51m 35s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 49s hadoop-project in the patch passed.
+1 💚 unit 1m 7s hadoop-gcp in the patch passed.
+1 💚 unit 0m 58s hadoop-tools-dist in the patch passed.
+1 💚 asflicense 1m 20s The patch does not generate ASF License warnings.
262m 31s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7904/2/artifact/out/Dockerfile
GITHUB PR #7904
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint
uname Linux f0f6132d68ee 5.15.0-151-generic #161-Ubuntu SMP Tue Jul 22 14:25:40 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision HADOOP-19343 / 93c3d2a
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7904/2/testReport/
Max. process+thread count 530 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-tools/hadoop-gcp hadoop-tools/hadoop-tools-dist U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7904/2/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@cnauroth
Copy link
Contributor Author

The last -1s are for lack of tests (not relevant) and root build failing (due to dependency convergence problem that this patch is fixing). I'll plan on committing to the feature branch later.

cnauroth added a commit that referenced this pull request Aug 29, 2025
… and mark exclusion in hadoop-tools-dist.

Closes #7904

Signed-off-by: Shilun Fan <slfan1989@apache.org>
@cnauroth
Copy link
Contributor Author

Thanks again everyone! This is merged into the feature branch.

@cnauroth cnauroth closed this Aug 29, 2025
cnauroth added a commit to cnauroth/hadoop that referenced this pull request Aug 29, 2025
… and mark exclusion in hadoop-tools-dist.

Closes apache#7904

Signed-off-by: Shilun Fan <slfan1989@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants