[SPARK-33090][BUILD][test-hadoop2.7] Upgrade Google Guava to 29.0-jre #30022

sfcoy · 2020-10-13T04:42:30Z

Compatible with Hadoop > 3.2.0
Future proof for a while

What changes were proposed in this pull request?

Upgrade the Google Guava dependency for compatibility with Hadoop 3.2.1 and Hadoop 3.3.0.

Why are the changes needed?

Spark fails at runtime with NoSuchMethodExceptions when built/run in conjunction with these versions, which make use of com.google.common.base.Preconditions methods that are not present in the version of Guava currently specified for Spark.

Does this PR introduce any user-facing change?

This change introduces new dependencies into the build which are imported by the guava pom file.

How was this patch tested?

We are currently running ETL production processes using Spark builds with this Guava version (based on the 3.0.1 tag).

* Compatible with Hadoop > 3.2.0 * Future proof for a while

AngersZhuuuu · 2020-10-13T06:29:38Z

FYI @dongjoon-hyun @srowen
How about we make guava version change with hadoop version in profile?

sfcoy · 2020-10-13T06:54:31Z

FYI @dongjoon-hyun @srowen
How about we make guava version change with hadoop version in profile?

Hi @AngersZhuuuu , I'm not sure I see any benefit in that. It will increase the complexity of an already complicated build system. It's significantly more than just a version number change. If you look at the changed files you will see what I mean.

Complexity is the enemy of maintainability.

srowen · 2020-10-13T14:19:20Z

The big big problem here is that previous Hadoop versions (<= 3.2.0) use Guava 14 or so, so this would break some compatibility with them. I think it could only happen for a Hadoop 3.2.1+ profile, but, there may be a good idea.
We'd still have to figure out whether it breaks compatibility with other libs.

dongjoon-hyun · 2020-10-13T16:10:25Z

ok to test

dongjoon-hyun · 2020-10-13T16:11:55Z

Thanks, @sfcoy . This is an interesting approach.

dongjoon-hyun · 2020-10-13T16:17:50Z

BTW, @sfcoy and @AngersZhuuuu .

For the following, Apache Spark community wants to use the official Hadoop 3 client to cut the dependency dramatically.

Upgrade the Google Guava dependency for compatibility with Hadoop 3.2.1 and Hadoop 3.3.0.

Please see here.

[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile #29843 (SPARK-29250 Upgrade to Hadoop 3.2.1 and move to shaded client)

After SPARK-29250, I guess this PR will be a general Guava version upgrade PR without any relation to Hadoop 3.2.1.

cc @sunchao

AngersZhuuuu · 2020-10-14T01:57:45Z

After SPARK-29250, I guess this PR will be a general Guava version upgrade PR without any relation to Hadoop 3.2.1.

IMO, if spark-3 with hadoop3.2 can work well in Hadoop cluster (2.6/2/7/2.8. etc), it's ok just use hadoop 3.2 client.
In our hadoop cluster, we use spark-2.4-hadoop-2.6 run in hadoop-3.2.1 cluster, works well.

SparkQA · 2020-10-14T21:18:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34358/

SparkQA · 2020-10-14T21:36:06Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34358/

SparkQA · 2020-10-14T22:42:25Z

Test build #129752 has finished for PR 30022 at commit 3ff577e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sunchao · 2020-10-15T17:38:11Z

Not sure if this works well with Hive 2.3.x also since it is still on Guava 14.0.1.

IMO, if spark-3 with hadoop3.2 can work well in Hadoop cluster (2.6/2/7/2.8. etc), it's ok just use hadoop 3.2 client.

Yes it's expected to work. There is an issue HDFS-15191 which potentially breaks compatibility between Hadoop 3.2.1 and 2.x server but it is fixed in 3.2.2 (which Spark is probably going to use).

SparkQA · 2021-01-12T17:18:18Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38564/

SparkQA · 2021-01-12T17:22:40Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38564/

sfcoy · 2021-01-13T08:27:28Z

The Kubernetes integration test appears to be running out of disk space:

Step 5/18 : COPY jars /opt/spark/jars failed to copy files: failed to copy directory: Error processing tar file(exit status 1): write /kubernetes-model-networking-4.10.3.jar: no space left on device

github-actions · 2021-04-24T00:19:11Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

SPARK-33090 Upgrade Google Guava to 29.0-jre

3ff577e

* Compatible with Hadoop > 3.2.0 * Future proof for a while

dongjoon-hyun changed the title ~~SPARK-33090 Upgrade Google Guava to 29.0-jre~~ [SPARK-33090][BUILD][test-hadoop2.7] Upgrade Google Guava to 29.0-jre Oct 13, 2020

github-actions bot added the Stale label Apr 24, 2021

github-actions bot closed this Apr 25, 2021

LuciferYang mentioned this pull request Aug 26, 2021

[SPARK-36598][SHUFFLE][SQL] Avoid the memory leak of Guava cache by add maximumWeight limit #33848

Closed

[SPARK-33090][BUILD][test-hadoop2.7] Upgrade Google Guava to 29.0-jre #30022

[SPARK-33090][BUILD][test-hadoop2.7] Upgrade Google Guava to 29.0-jre #30022

Uh oh!

Conversation

sfcoy commented Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AngersZhuuuu commented Oct 13, 2020

Uh oh!

sfcoy commented Oct 13, 2020

Uh oh!

srowen commented Oct 13, 2020

Uh oh!

dongjoon-hyun commented Oct 13, 2020

Uh oh!

dongjoon-hyun commented Oct 13, 2020

Uh oh!

dongjoon-hyun commented Oct 13, 2020

Uh oh!

AngersZhuuuu commented Oct 14, 2020

Uh oh!

SparkQA commented Oct 14, 2020

Uh oh!

SparkQA commented Oct 14, 2020

Uh oh!

SparkQA commented Oct 14, 2020

Uh oh!

sunchao commented Oct 15, 2020

Uh oh!

SparkQA commented Jan 12, 2021

Uh oh!

SparkQA commented Jan 12, 2021

Uh oh!

sfcoy commented Jan 13, 2021

Uh oh!

github-actions bot commented Apr 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sfcoy commented Oct 13, 2020 •

edited

Loading