Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Resolve hbase hotspot issue when materializing #3790

Merged
merged 2 commits into from
Oct 20, 2023

Conversation

sudohainguyen
Copy link
Collaborator

What this PR does / why we need it: This PR resolves hotspot issue when materializing records to HBase online store.
Previously, row keys are generated with simple serialization, which led to unbalanced row keys distribution across HBase regions. So I resolve this by applying the similar mechanism being used in BigTable online store because BigTable and HBase work the same way under the hood.

Which issue(s) this PR fixes:

Fixes #

@sudohainguyen
Copy link
Collaborator Author

hi @achals , who is currently in charge of HBase module to review my PR? 🤔

@sudohainguyen sudohainguyen changed the title fix: resolve hbase hotspot issue when materializing fix: Resolve hbase hotspot issue when materializing Oct 11, 2023
Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test
/lgtm

@sudohainguyen
Copy link
Collaborator Author

/ok-to-test /lgtm

seems not working 😢

@sudohainguyen sudohainguyen self-assigned this Oct 20, 2023
@sudohainguyen sudohainguyen force-pushed the feat/hbase_online_store branch 2 times, most recently from 765fc07 to bfff90a Compare October 20, 2023 17:47
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
@achals achals merged commit 7376db8 into feast-dev:master Oct 20, 2023
15 checks passed
@sudohainguyen sudohainguyen deleted the feat/hbase_online_store branch October 21, 2023 16:56
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Oct 23, 2023
* fix: Resolve hbase hotspot issue when materializing

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* chore: Refactor internal table id generator

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

---------

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
woop pushed a commit that referenced this pull request Jan 13, 2024
# [0.35.0](v0.34.0...v0.35.0) (2024-01-13)

### Bug Fixes

* Add async refresh to prevent synchronous refresh in main thread ([#3812](#3812)) ([9583ed6](9583ed6))
* Adopt connection pooling for HBase ([#3793](#3793)) ([b3852bf](b3852bf))
* Bytewax engine create configmap from object ([#3821](#3821)) ([25e9775](25e9775))
* Fix warnings from deprecated paths and update default log level ([#3757](#3757)) ([68a8737](68a8737))
* improve parsing bytewax job status ([5983f40](5983f40))
* make bytewax settings unexposed ([ae1bb8b](ae1bb8b))
* Make generated temp table name escaped ([#3797](#3797)) ([175d796](175d796))
* Pin numpy version to avoid spammy deprecation messages ([774ed33](774ed33))
* Redundant feature materialization and premature incremental materialization timestamp updates ([#3789](#3789)) ([417b16b](417b16b)), closes [#6](#6) [#7](#7)
* Resolve hbase hotspot issue when materializing ([#3790](#3790)) ([7376db8](7376db8))
* Set keepalives_idle None by default ([#3756](#3756)) ([8717e9b](8717e9b))
* Set upper bound for bigquery client due to its breaking changes ([2151c39](2151c39))
* UI project cannot handle fallback routes ([#3766](#3766)) ([96ece0f](96ece0f))
* update dependencies versions due to conflicts ([5dc0b24](5dc0b24))
* Update jackson and remove unnecessary logging ([#3809](#3809)) ([018d0ea](018d0ea))
* upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](052182b))

### Features

* Add get online feature rpc to gprc server ([#3815](#3815)) ([01db8cc](01db8cc))
* Add materialize and materialize-incremental rest endpoints ([#3761](#3761)) ([fa600fe](fa600fe)), closes [#3760](#3760)
* add redis sentinel support ([3387a15](3387a15))
* add redis sentinel support ([4337c89](4337c89))
* add redis sentinel support format lint ([aad8718](aad8718))
* Add support for `table_create_disposition` in bigquery job for offline store ([#3762](#3762)) ([6a728fe](6a728fe))
* Add support for in_cluster config and additional labels for bytewax materialization ([#3754](#3754)) ([2192e65](2192e65))
* Apply cache to load proto registry for performance ([#3702](#3702)) ([709c709](709c709))
* Make bytewax job write as mini-batches ([#3777](#3777)) ([9b0e5ce](9b0e5ce))
* Optimize bytewax pod resource with zero-copy ([9cf9d96](9cf9d96))
* Support GCS filesystem for bytewax engine ([#3774](#3774)) ([fb6b807](fb6b807))
tokoko pushed a commit to tokoko/feast that referenced this pull request Feb 6, 2024
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13)

### Bug Fixes

* Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6))
* Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf))
* Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775))
* Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737))
* improve parsing bytewax job status ([5983f40](feast-dev@5983f40))
* make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b))
* Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796))
* Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33))
* Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7)
* Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8))
* Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b))
* Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39))
* UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f))
* update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24))
* Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea))
* upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b))

### Features

* Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc))
* Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760)
* add redis sentinel support ([3387a15](feast-dev@3387a15))
* add redis sentinel support ([4337c89](feast-dev@4337c89))
* add redis sentinel support format lint ([aad8718](feast-dev@aad8718))
* Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe))
* Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65))
* Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709))
* Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce))
* Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96))
* Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807))

Signed-off-by: tokoko <togurg14@freeuni.edu.ge>
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
* fix: Resolve hbase hotspot issue when materializing

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* chore: Refactor internal table id generator

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

---------

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Attila Toth <hello@attilatoth.dev>
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13)

### Bug Fixes

* Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6))
* Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf))
* Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775))
* Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737))
* improve parsing bytewax job status ([5983f40](feast-dev@5983f40))
* make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b))
* Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796))
* Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33))
* Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7)
* Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8))
* Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b))
* Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39))
* UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f))
* update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24))
* Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea))
* upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b))

### Features

* Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc))
* Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760)
* add redis sentinel support ([3387a15](feast-dev@3387a15))
* add redis sentinel support ([4337c89](feast-dev@4337c89))
* add redis sentinel support format lint ([aad8718](feast-dev@aad8718))
* Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe))
* Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65))
* Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709))
* Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce))
* Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96))
* Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807))

Signed-off-by: Attila Toth <hello@attilatoth.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants