-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Feature Request / Improvement
Spark uses an isolated classloader to load Hive3-related jars (specified via spark.sql.hive.metastore.jars). However, this mechanism only functions within HiveExternalCatalog and does not work for Iceberg. This is because Spark restores the classloader for the main thread after initializing the Hive client (as seen in this code snippet). As a result, the JVM in Spark always uses the Hive2 implementation class, making it impossible to leverage Hive3 features in Iceberg with Spark.
While the Hive2 client is generally compatible with Hive3, but some features are missing. For example, the ability to use catalog names to isolate databases and tables (e.g., the metastore.catalog.default property) is only supported in Hive3's metadata client.
To address this limitation, we need to introduce a mechanism similar to Spark's isolated classloader approach to load Hive3 jars specifically for Iceberg. This would enable Iceberg to fully utilize Hive3's advanced features when integrated with Spark.
Query engine
None
Willingness to contribute
- I can contribute this improvement/feature independently
- I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- I cannot contribute this improvement/feature at this time