Skip to content

Support Hive3 when using Iceberg with Spark #14082

@yabola

Description

@yabola

Feature Request / Improvement

Spark uses an isolated classloader to load Hive3-related jars (specified via spark.sql.hive.metastore.jars). However, this mechanism only functions within HiveExternalCatalog and does not work for Iceberg. This is because Spark restores the classloader for the main thread after initializing the Hive client (as seen in this code snippet). As a result, the JVM in Spark always uses the Hive2 implementation class, making it impossible to leverage Hive3 features in Iceberg with Spark.

While the Hive2 client is generally compatible with Hive3, but some features are missing. For example, the ability to use catalog names to isolate databases and tables (e.g., the metastore.catalog.default property) is only supported in Hive3's metadata client.

To address this limitation, we need to introduce a mechanism similar to Spark's isolated classloader approach to load Hive3 jars specifically for Iceberg. This would enable Iceberg to fully utilize Hive3's advanced features when integrated with Spark.

Query engine

None

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions