Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HBase connector #6010

Closed
damiencarol opened this issue Sep 1, 2016 · 17 comments
Closed

Implement HBase connector #6010

damiencarol opened this issue Sep 1, 2016 · 17 comments
Labels

Comments

@damiencarol
Copy link
Contributor

damiencarol commented Sep 1, 2016

Throw an issue here as I'm working on it and will push a PR soon.

Plan to have a production ready plugin for the end of the year.

Main design choices:

  • Tables defined in JSON configuration files (like kafka/redis/mongodb connector)
  • Split aligned to region with HBase key pruning
  • parallelism aligned to region to scale
  • Manage configuration file for HBase (auto-updater)

Any advices, questions or tips are welcomes (specialy if you had started a plugin like me)

@adamjshook
Copy link
Member

@damiencarol You may find some of the work for Apache Accumulo helpful, currently sitting in #5030.

@damiencarol
Copy link
Contributor Author

@adamjshook yeah, I'm reading the PR code right now

@adamjshook
Copy link
Member

@damiencarol I'm happy to answer any questions or give you any pointers on some BigTable-esque optimizations I've built to improve query times.

@damiencarol
Copy link
Contributor Author

@adamjshook did your connector run in production ? also did you implemented insert/update/delete ?

@adamjshook
Copy link
Member

@damiencarol Yes, it's been in production since March/April or so. INSERT is supported, but we use the Java APIs and some tools I've built for higher throughput. Presto doesn't support UPDATE (as far as I know), but you can issue another INSERT statement that shares the same Accumulo row ID and it effectively acts as an update. I haven't implemented DELETE yet -- haven't had a use case come up to drive the effort of implementing it.

@damiencarol
Copy link
Contributor Author

First naive version here #6037 .
Please be kind with me, it is a work in progress.

@yxydde
Copy link

yxydde commented Aug 10, 2017

what the progress ?

@nemo326
Copy link

nemo326 commented Dec 27, 2017

what the progress now,please ? ~~~

@ganeshjothikumar
Copy link

We are exploring ways of trying to use Presto to query a HBase table. Can I get an update on where we are w.r.t HBase connector for Presto and any references for the same ?

@GrigorievNick
Copy link

Any news about this part?

@JamesRTaylor
Copy link

JamesRTaylor commented Aug 29, 2018 via email

@ganeshjothikumar
Copy link

ganeshjothikumar commented Aug 31, 2018

@JamesRTaylor Is it fair to say the Presto -> HBase (through Phoenix connector) is fairly nascent and probably not widely used in large production systems. Looking at doing something like this for a fairly large web scale production system. So hence wanted to know how hardened this is and current usage.

@JamesRTaylor
Copy link

Sounds like the author of the Phoenix connector is using it in production, @ganeshjothikumar, but you should ask him to confirm. Since neither the HBase connector nor the Phoenix connector are part of Presto yet, I'd imagine that they're both similar. FWIW, the SQL abstraction and query push down that Phoenix provides will make for a better fit as a Presto connector unless you're either 1) ok with many serial, full table scans by HBase, or 2) you try to do what Phoenix is doing within the HBase connector. Neither of these is a good option IMHO.

@willshen
Copy link

What's the latest on this PR (i.e., is it moving forward)?

@ShawshankLin
Copy link

what the progress now??

@stale
Copy link

stale bot commented Jun 22, 2021

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

@stale stale bot added the stale label Jun 22, 2021
@stale stale bot closed this as completed Jul 21, 2021
@Crossoverrr
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants