Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump HiveCatalog hive-metastore dependency to Hive 4 #10429

Closed
ochanism opened this issue Jun 3, 2024 · 13 comments
Closed

Bump HiveCatalog hive-metastore dependency to Hive 4 #10429

ochanism opened this issue Jun 3, 2024 · 13 comments
Labels
hive question Further information is requested stale
Milestone

Comments

@ochanism
Copy link

ochanism commented Jun 3, 2024

Query engine

No response

Question

https://iceberg.apache.org/docs/1.5.2/configuration/#hadoop-configuration

image

I've been implementing a data ingester with Apache Iceberg 1.5.2 JAVA API.
I faced a garbage hive lock issue with a hive-metastore catalog.
I'm going to try to disable the hive lock according to the document as shown in the above screenshot.
So I deployed a hive-metastore 4.0.0 server and tried to update catalog configs and dependencies.

# dependencies
org.apache.iceberg:iceberg-hive-metastore:1.5.2
org.apache.hive:hive-metastore:3.1.3

But iceberg-hive-metastore:1.5.2 couldn't be compiled with hive-metastore:4.0.0. (only worked with 3.1.3)
I confirmed that the data ingester worked with the above dependencies (3.1.3) with hive-metastore 4.0.0 server.
I wonder if this setup is OK. Or could be there some issues??

@ochanism ochanism added the question Further information is requested label Jun 3, 2024
@Fokko
Copy link
Contributor

Fokko commented Jun 3, 2024

Hey @ochanism

Thanks for reaching out. Hive 4.x supports Iceberg out of the box. Before an external Iceberg dependency was needed, but Hive 4+ ships with Iceberg directly. So the following should work:

create external table tbl_ice stored by iceberg tblproperties ('format-version'='2') as
select * from source;

@ochanism
Copy link
Author

ochanism commented Jun 3, 2024

@Fokko Sorry for my ambiguous question.
I'm using Trino as a query engine with hive-metastore catalog.
And for the data ingestion (streaming), I developed a JAVA server with iceberg 1.5.2 API.
To eliminate the hive lock, I updated hive-metastore from 3.1.3 to 4.0.0.
And set the iceberg.engine.hive.lock-enabled=false for hive catalog property (HiveCatalog class).
My JAVA server still has this dependency: org.apache.hive:hive-metastore:3.1.3.
So I wonder if this setup is OK. (Is there could be any error due to hive-metastore version mismatch? client-library (3.1.3), real-server (4.0.0))

@Fokko
Copy link
Contributor

Fokko commented Jun 3, 2024

@ochanism Thanks for clearing that up, that helps. Can you share the compilation error that you're seeing?

@ochanism
Copy link
Author

ochanism commented Jun 3, 2024

@Fokko
This error occurred while initializing hive catalog.

var catalog = new HiveCatalog();
catalog.initialize(this.catalogName, this.properties);
Caused by: java.lang.NoSuchFieldError: Class org.apache.hadoop.hive.conf.HiveConf$ConfVars does not have member field 'org.apache.hadoop.hive.conf.HiveConf$ConfVars METASTOREURIS'
	at org.apache.iceberg.hive.HiveCatalog.initialize(HiveCatalog.java:95)
# dependencies
org.apache.iceberg:iceberg-hive-metastore:1.5.2
org.apache.hive:hive-metastore:4.0.0

this.conf.set(HiveConf.ConfVars.METASTOREURIS.varname, properties.get(CatalogProperties.URI));

@Fokko
Copy link
Contributor

Fokko commented Jun 3, 2024

I see, the property has been updated since Hive 4: apache/hive@b33b3d3#diff-b7bbe8545a21ec7d7e9cfe40ef66444789e332996aaa9e7f1430dbe4822a2c9cR270

They suggest using the shaded dependency: apache/hive#4919 (comment)

@ochanism
Copy link
Author

ochanism commented Jun 3, 2024

Thanks for the information. Do you mean that Hive 4.0 with Iceberg is managed by Hive community?
I want to use the latest Iceberg version, but the shaded jar used Iceberg 1.4.3.
Is there any plan to update Iceberg library to support hive-metastore 4.0 catalog without the shaded jar?

@Fokko
Copy link
Contributor

Fokko commented Jun 3, 2024

@ochanism The problem is that Hive is both a query engine and a metastore (catalog in Iceberg). The maintenance of the query engine (the support to read and write Iceberg), is now covered by the Hive community as of Hive 4. The catalog is still in the codebase of Iceberg, and will probably migrate at some point to Hive 4 as well. But I think that will take some time.

There is also another discussion going on in parallel. Since Iceberg has its own catalog (REST Catalog), it might be that the REST catalog becomes the preferred catalog, and the other ones become deprecated at some point. You could easily support a Hive catalog behind a REST catalog interface. Or even better, provide a native REST catalog interface by Hive itself (apache/hive#5145).

@Fokko Fokko added this to the Iceberg 2.0.0 milestone Jun 3, 2024
@Fokko Fokko changed the title Does the Iceberg 1.5.2 supports hive-metastore 4.0.0? Bump HiveCatalog hive-metastore dependency to Hive 4 Jun 3, 2024
@Fokko Fokko added the hive label Jun 3, 2024
@pvary
Copy link
Contributor

pvary commented Jun 3, 2024

@ochanism: If you are willing to take some risks, you might be able to create your own catalog implementation based on https://github.com/apache/hive/blob/master/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveCatalog.java and the current Iceberg HiveCatalog implementation. It will not be supported by any of the communities, but the code changes could be simple, like changing

    if (properties.containsKey(CatalogProperties.URI)) {
      this.conf.set(HiveConf.ConfVars.METASTORE_URIS.varname, properties.get(CatalogProperties.URI));
    }

to

    if (properties.containsKey(CatalogProperties.URI)) {
      this.conf.set(HiveConf.ConfVars.METASTOREURIS.varname, properties.get(CatalogProperties.URI));
    }

notice the missing _

@ochanism
Copy link
Author

ochanism commented Jun 4, 2024

@Fokko Thanks for your kind explanation. I understood the current situation. And the plan for unifying catalogs with the REST catalog looks amazing. I hope that it will be available soon.

@pvary Thanks for your suggestion. I will try it and leave the result here after verifying it.

@ochanism
Copy link
Author

ochanism commented Jun 4, 2024

@pvary I tried it, but many classes were in private or default scopes. So I had to copy so many class files to modify it.
I decided to move REST with the JDBC catalog according to the @Fokko opinion (REST will be the preferred catalog in the future.). Thanks for helping me guys!

@pan3793
Copy link
Member

pan3793 commented Jun 5, 2024

HIVE-26882 and HIVE-28121 have been landed in Hive 2.3.10, though Hive 2.3 is EOL, this version is adopted widely, e.g. by Spark, and Flink.

Copy link

github-actions bot commented Dec 3, 2024

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Dec 3, 2024
Copy link

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hive question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

4 participants