Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,7 @@ private[hive] object IsolatedClientLoader extends Logging {
}
val hiveArtifacts = version.extraDeps ++
Seq("hive-metastore", "hive-exec", "hive-common", "hive-serde")
.map(a => s"org.apache.hive:$a:${version.fullVersion}") ++
Seq("com.google.guava:guava:14.0.1") ++ hadoopJarNames
.map(a => s"org.apache.hive:$a:${version.fullVersion}") ++ hadoopJarNames
Copy link
Contributor

@JoshRosen JoshRosen Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think the Guava version needs to be a function of version, rather than matching Spark's Guava version or omitting Guava: if we remove Guava from shared classes but also changed the now-non-shared metastore Guava version then it's the same net effect and breaks older Hive versions.

I believe that newer versions of Hive shade Guava (in which case they're insensitive to whatever value we set here). I think we could either (a) continue to unconditionally use Guava 14.0.1 here, or (b) conditionally use it only for older Hive versions that predated the Guava shading.

Copy link
Member Author

@pan3793 pan3793 Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we remove Guava from shared classes but also changed the now-non-shared metastore Guava version then it's the same net effect and breaks older Hive versions.

Hmm, I don't get the idea. After removing Guava from the shared class, we don't need to keep Hive and Spark using the same Guava. And we did not add Guava to HiveVersion#exclusions, each Hive version should automatically pull all transitive dependencies including Guava if necessary.

Anyway, let's simply try (a) first.

Copy link
Member Author

@pan3793 pan3793 Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen Unfortunately, it does not work.

HiveClientSuite.0.13: createDatabase

Caused by: sbt.ForkMain$ForkError: java.lang.AssertionError: java.lang.NoSuchMethodException: com.google.common.base.internal.Finalizer.startFinalizer(java.lang.Class, java.lang.Object)
	at com.google.common.base.FinalizableReferenceQueue.getStartFinalizer(FinalizableReferenceQueue.java:283)
	at com.google.common.base.FinalizableReferenceQueue.<clinit>(FinalizableReferenceQueue.java:86)
	at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:427)
	at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:501)
	at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:298)
        ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, cc @sunchao, I think the experiment indicates that we can not upgrade Guava while keeping IsolatedClientLoader

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen @sunchao do you have any suggestions for this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

things changed after we removed support for Hive prior 2.0.0, we can remove Guava from the shared class list now, thus we can do Guava upgrading while keeping IsolatedClientLoader


implicit val printStream: PrintStream = SparkSubmit.printStream
val classpaths = quietly {
Expand Down Expand Up @@ -207,7 +206,6 @@ private[hive] class IsolatedClientLoader(
name.startsWith("org.apache.spark.") ||
isHadoopClass ||
name.startsWith("scala.") ||
(name.startsWith("com.google") && !name.startsWith("com.google.cloud")) ||
name.startsWith("java.") ||
name.startsWith("javax.sql.") ||
sharedPrefixes.exists(name.startsWith)
Expand Down