Skip to content

Conversation

@sfcoy
Copy link

@sfcoy sfcoy commented Oct 13, 2020

  • Compatible with Hadoop > 3.2.0
  • Future proof for a while

What changes were proposed in this pull request?

Upgrade the Google Guava dependency for compatibility with Hadoop 3.2.1 and Hadoop 3.3.0.

Why are the changes needed?

Spark fails at runtime with NoSuchMethodExceptions when built/run in conjunction with these versions, which make use of com.google.common.base.Preconditions methods that are not present in the version of Guava currently specified for Spark.

Does this PR introduce any user-facing change?

This change introduces new dependencies into the build which are imported by the guava pom file.

How was this patch tested?

We are currently running ETL production processes using Spark builds with this Guava version (based on the 3.0.1 tag).

 * Compatible with Hadoop > 3.2.0
 * Future proof for a while
@AngersZhuuuu
Copy link
Contributor

FYI @dongjoon-hyun @srowen
How about we make guava version change with hadoop version in profile?

@sfcoy
Copy link
Author

sfcoy commented Oct 13, 2020

FYI @dongjoon-hyun @srowen
How about we make guava version change with hadoop version in profile?

Hi @AngersZhuuuu , I'm not sure I see any benefit in that. It will increase the complexity of an already complicated build system. It's significantly more than just a version number change. If you look at the changed files you will see what I mean.

Complexity is the enemy of maintainability.

@srowen
Copy link
Member

srowen commented Oct 13, 2020

The big big problem here is that previous Hadoop versions (<= 3.2.0) use Guava 14 or so, so this would break some compatibility with them. I think it could only happen for a Hadoop 3.2.1+ profile, but, there may be a good idea.
We'd still have to figure out whether it breaks compatibility with other libs.

@dongjoon-hyun dongjoon-hyun changed the title SPARK-33090 Upgrade Google Guava to 29.0-jre [SPARK-33090][BUILD][test-hadoop2.7] Upgrade Google Guava to 29.0-jre Oct 13, 2020
@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun
Copy link
Member

Thanks, @sfcoy . This is an interesting approach.

@dongjoon-hyun
Copy link
Member

BTW, @sfcoy and @AngersZhuuuu .

For the following, Apache Spark community wants to use the official Hadoop 3 client to cut the dependency dramatically.

Upgrade the Google Guava dependency for compatibility with Hadoop 3.2.1 and Hadoop 3.3.0.

Please see here.

After SPARK-29250, I guess this PR will be a general Guava version upgrade PR without any relation to Hadoop 3.2.1.

cc @sunchao

@AngersZhuuuu
Copy link
Contributor

After SPARK-29250, I guess this PR will be a general Guava version upgrade PR without any relation to Hadoop 3.2.1.

IMO, if spark-3 with hadoop3.2 can work well in Hadoop cluster (2.6/2/7/2.8. etc), it's ok just use hadoop 3.2 client.
In our hadoop cluster, we use spark-2.4-hadoop-2.6 run in hadoop-3.2.1 cluster, works well.

@SparkQA
Copy link

SparkQA commented Oct 14, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34358/

@SparkQA
Copy link

SparkQA commented Oct 14, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34358/

@SparkQA
Copy link

SparkQA commented Oct 14, 2020

Test build #129752 has finished for PR 30022 at commit 3ff577e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sunchao
Copy link
Member

sunchao commented Oct 15, 2020

Not sure if this works well with Hive 2.3.x also since it is still on Guava 14.0.1.

IMO, if spark-3 with hadoop3.2 can work well in Hadoop cluster (2.6/2/7/2.8. etc), it's ok just use hadoop 3.2 client.

Yes it's expected to work. There is an issue HDFS-15191 which potentially breaks compatibility between Hadoop 3.2.1 and 2.x server but it is fixed in 3.2.2 (which Spark is probably going to use).

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38564/

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38564/

@sfcoy
Copy link
Author

sfcoy commented Jan 13, 2021

The Kubernetes integration test appears to be running out of disk space:

Step 5/18 : COPY jars /opt/spark/jars failed to copy files: failed to copy directory: Error processing tar file(exit status 1): write /kubernetes-model-networking-4.10.3.jar: no space left on device

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants