Is anyone using the Hbase Store? #2367

GCHQDev404 · 2020-12-18T16:42:22Z

We are looking to do a number of version upgrades in Gaffer 2.0 including moving to Hadoop 3. HBase does not support Hadoop 3 out the box (it needs to be built from source using a profile). If someone is using the Hbase Store, we can take steps to include it in version 2. However, if no one is using it, we would benefit from removing support for the Hbase store until HBase releases a compatible version

rwer81 · 2021-01-04T12:27:17Z

We use HgraphDb for graph but we have some problems(performance, no community etc.) with it. It uses Hbase for graph store. So we gained experience on Hbase and we use Hbase 2.1.
Nowadays, we test Gaffer to decide if it is right tool for us or not. If we decide to use it in production and migrate from Hgraphdb to Gaffer, we may change graph store to Accumulo. But we are not sure it can handle our critical issues that are depend on our business logic and data.

d47853 · 2021-01-04T13:00:07Z

Thanks @rwer81

Let us know how you get on with your experiment. We have found Accumulo a highly performant store especially leveraging the iterators for business logic. We've got docker project which might be of use to you when trying it out.

rwer81 · 2021-01-04T13:20:12Z

Thank for your reply. The points below may be out of subject, but I consider them to be useful considering the scope of our conversation.

Our special cases;
1- We have 1PB of linked data and it grows by about 1TB daily. It has to handle this traffic.
2- There are some super-vertices that may have +1 million edges, so we limit edges when traversing.
3- There are a lot of DML operations on data. These operations are happening in real-time.
4- 90% of our data consists the same edge. So, edge sharding/distributing is important.
5- Generally, We are creating the ID of edges manually. We use these IDs in upsert operations. When inserting data via bulk-loading, HBase overwrites keys, so that we don't have to check whether the key exists or not. This method improves loading performance in some specific cases.
6- We don't store vertex properties in graph-for now, they are stored at ElasticSearch-. For that reason, we are focused morely on traversals in graph. There are cases which requires us to execute 4-depth traversals.

It would be greatly appreciated if we could use some of your insight and help during our experiments.

Thanks.

d47853 · 2021-01-04T16:16:19Z

Your scale shouldn't be a problem as Accumulo is capable of scaling to that kind of size. You could use something like Spark or AddElementsFromHDFS to deal with your data ingest. To limit your edge traversal, we'd recommend using HyperLogLogSketches as a property on an entity as it can be easily generated and aggregated. Gaffer offers its sketches library to help with this. The bulk import capability in Accumulo should work similar to hbase, avoiding the query-update-put sequence.

rwer81 · 2021-01-04T16:25:06Z

Thanks for your explanation.
You encouraged me to use Gaffer. I continue to test.
I conclusion, If we decide to use Gaffer, we use Accumulo presumably instead of Hbase.

GCHQDev404 added the question Specific query about part of the codebase label Dec 18, 2020

GCHQDev404 assigned d47853 Dec 18, 2020

d47853 mentioned this issue Jan 4, 2021

Hadoop, Hbase, Spark, Kafka, Zookeeper Compatibility #2368

Closed

n3101 unassigned d47853 Apr 20, 2021

n3101 closed this as completed Jul 27, 2021

n3101 mentioned this issue Jan 5, 2022

Deprecate HBase. #2556

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is anyone using the Hbase Store? #2367

Is anyone using the Hbase Store? #2367

GCHQDev404 commented Dec 18, 2020

rwer81 commented Jan 4, 2021

d47853 commented Jan 4, 2021

rwer81 commented Jan 4, 2021

d47853 commented Jan 4, 2021

rwer81 commented Jan 4, 2021 •

edited

Loading

Is anyone using the Hbase Store? #2367

Is anyone using the Hbase Store? #2367

Comments

GCHQDev404 commented Dec 18, 2020

rwer81 commented Jan 4, 2021

d47853 commented Jan 4, 2021

rwer81 commented Jan 4, 2021

d47853 commented Jan 4, 2021

rwer81 commented Jan 4, 2021 • edited Loading

rwer81 commented Jan 4, 2021 •

edited

Loading