-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HBASE-26534] Update HBase version to 2.4, make Hadoop 3 and Spark 3 default versions. #88
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
The purpose of this is to match our hbase-server version with the jar provided on EMR. While our HBase is listed as 2.2.6, the library code we have from Amazon is for 2.4.1+. There was a breaking change in 2.4.1 and this mismatch prevents bulkLoadThinRows() from working.
This is great. I ran into an issue with running this on AWS EMR and which provides HBase 2.4.1+ library code, and this had exactly the updates I needed. I'd love to see this merged and packaged in a release. |
One thing to think about with this: HBase 2.4.1 introduced (via this PR apache/hbase#2800) a breaking change for This PR enables you to use this with HBase 2.4.1 and later, but will prevent you from using it with earlier HBase versions. I know you currently allow the HBase version to be configured via the system properties, but as-is this change won't work with the HBase library prior to 2.4.1. |
Thanks @ianawilson for looking at this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LucaCanali Makes sense to me!
I'm not a contributor to this repo, so I don't know how much weight my approval carries, but I've been using this branch successfully in a couple of ad hoc Spark jobs with HBase client 2.4, and it's been great.
@LucaCanali FYI, looks like there's a PR with a similar intent and slightly different approach #89 |
@LucaCanali do you mind merging my changes from #89 to here? I will review yours instead, and that could help to move upgrade the forward . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you run the unit test on your local environment ? may I ask if you're using linux not mac?
basically, I had some changes in TestJavaHBaseContext.java and other *Suite.scala , please see them in my PR #89 , that helps to prevent unit test failure on my mac. but it may be only me having these issues. if you can, please help to merge those changes in unit tests as well and IMO it's helping to run it smooth without conflicts between each tests.
spark/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
Thanks @taklwu for comments and review. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! and thanks for merging the changes
This is great, thank you both @LucaCanali @taklwu! One question I've got: Is it possible for this update to be released and published to maven central? It seems like there are quite a few updates on this repository that haven't been released there yet. |
This proposes to bump up the default versions used by HBase-connectors to recent versions, in particular for HBase, Spark, and Hadoop:
HBase is upgraded to 2.4.9, Spark to 3.1.2, Hadoop to 3.2.0.
This proposes also to make Spark 3 and Hadoop 3 default version (as opposed to Spark 2 and Hadoop 2).
Additional minor changes to the code are proposed/needed to compile the connector with HBase 2.4.9.