-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45265][SQL][WIP] Supporting Hive 4.0 metastore #43064
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I have a few comments.
- Are you using the current
beta-1
? - Is there a timeline for Hive 4.0 GA?
- Although I know that you filed this as
Bug
for some old releases, but I believe this PR should be a subtask for Apache Spark 4.0.0 because there is no existing Spark users with Apache Hive 4.0.0 Megastore.
Thanks!
Yes.
I will ask around but as I know they still have some blockers.
Sorry that was a mistake of mine thanks for fixing that in Jira. |
Thank you. And, if you are fine with Apache Spark 4.0, that's great! I was worried. 😄 |
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
Outdated
Show resolved
Hide resolved
cc @wangyum too |
@dongjoon-hyun |
Thank you so much for keeping us up-to-date, @attilapiros !
|
Is there any update for Apache Hive 4.0, @attilapiros ? |
@dongjoon-hyun they still having some more issues to solve (as I see some TPC-DS queries performance issues): |
Thank you for the updates and the link, @attilapiros . |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
### What changes were proposed in this pull request? This PR continues the work from #43064 and #45801 to support Hive Metastore Server 4.0. CHAR/VARCHAR type partition filter pushdown is not included in this PR, as it requires further investment. ### Why are the changes needed? Enhance the multiple hive metastore server support feature ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Passing HiveClient*Suites w/ 4.0 ### Was this patch authored or co-authored using generative AI tooling? no Closes #48823 from yaooqinn/SPARK-45265. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Supporting Hive 4.0 metastore where partition filters even for CHAR and a VARCHAR types can be pushed down.
Hive 4.0 is still beta! This is why this is a work on progress PR.
Why are the changes needed?
Supporting more Hive versions (with extra performance improvement) is good for our users.
Does this PR introduce any user-facing change?
Yes. Regarding supporting Hive 4.0 metastore the documentation is updated accordingly.
How was this patch tested?
Manually
I used the docker image of apache/hive:4.0.0-beta-1 for starting a metastore and a hiveserver2 (along with a hadoop3 docker image).
Created a table:
Inserted some values in beeline:
Started my spark in the hiveserver2 container as:
Run the query as:
And check the HMS calls in the metastore container in the file
/tmp/hive/hive.log
:Which contains the expected
get_partitions_by_filter
.Was this patch authored or co-authored using generative AI tooling?
No.