Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28097 Add documentation section for the Cache Aware balancer fu… #5495

Merged
merged 2 commits into from
Nov 2, 2023

Conversation

ragarkar
Copy link
Contributor

@ragarkar ragarkar commented Nov 2, 2023

…nction

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 27s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ HBASE-27389 Compile Tests _
_ Patch Compile Tests _
_ Other Tests _
1m 27s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5495
Optional Tests
uname Linux ca98d4f13f8c 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-27389 / 5f7df54
Max. process+thread count 40 (vs. ulimit of 30000)
modules C: . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 47s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ HBASE-27389 Compile Tests _
_ Patch Compile Tests _
_ Other Tests _
1m 52s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #5495
Optional Tests
uname Linux be853492ab8c 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-27389 / 5f7df54
Max. process+thread count 33 (vs. ulimit of 30000)
modules C: . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ HBASE-27389 Compile Tests _
+1 💚 spotless 0m 48s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 spotless 0m 36s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 12s The patch does not generate ASF License warnings.
3m 7s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5495
Optional Tests dupname asflicense spotless
uname Linux e756d8d1a277 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-27389 / 5f7df54
Max. process+thread count 42 (vs. ulimit of 30000)
modules C: . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@wchevreuil wchevreuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this documentation, @ragarkar .LGTM overall, I just had made a few re-wording suggestions. Let me know what you think.

@@ -1130,6 +1130,48 @@ For a RegionServer hosting data that can comfortably fit into cache, or if your

The compressed BlockCache is disabled by default. To enable it, set `hbase.block.data.cachecompressed` to `true` in _hbase-site.xml_ on all RegionServers.

==== Cache Aware Load Balancer

HBase uses ephemeral cache to cache the blocks by reading them from the slow storages and storing them to the bucket cache. This cache is warmed up every time a region server is started. Depending on the data size and the configured cache size, the cache warm up can take anywhere from a few minutes to a few hours. Doing this everytime the region server starts can be a very expensive process. To eliminate this, link:https://issues.apache.org/jira/browse/HBASE-27313[HBASE-27313] implemented the cache persistence feature where the region servers periodically persist the blocks cached in the bucket cache. This persisted information is then used to resurrect the cache in the event of a region server restart because of normal restart or crash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this explanation: HBase uses ephemeral cache to cache the blocks by reading them from the slow storages and storing them to the bucket cache. This cache is warmed up every time a region server is started.

Also let's mention the importance of this when using cloud storage:

Depending on the data size and the configured cache size, the cache warm up can take anywhere from a few minutes to a few hours. This become even more critical for HBase deployments over cloud storage, where compute is separated from storage. Doing this everytime the region server starts can be a very expensive process.


HBase uses ephemeral cache to cache the blocks by reading them from the slow storages and storing them to the bucket cache. This cache is warmed up every time a region server is started. Depending on the data size and the configured cache size, the cache warm up can take anywhere from a few minutes to a few hours. Doing this everytime the region server starts can be a very expensive process. To eliminate this, link:https://issues.apache.org/jira/browse/HBASE-27313[HBASE-27313] implemented the cache persistence feature where the region servers periodically persist the blocks cached in the bucket cache. This persisted information is then used to resurrect the cache in the event of a region server restart because of normal restart or crash.

link:https://issues.apache.org/jira/browse/HBASE-27999[HBASE-27999] implements the prefetch aware load balancer which is aimed at enhancing the capability of HBase to enable the balancer to consider the cache allocation of each region on region servers when calculating a new assignment plan and use the region/region server cache allocation information reported by region servers to calculate the percentage of HFiles cached for each region on the hosting server, and then use that as another factor when deciding on an optimal, new assignment plan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some re-wording:

implements the cache aware load balancer, which adds to the load balancer the ability to consider the cache allocation of each region on region servers when calculating a new assignment plan, using the region/region server cache allocation information reported by region servers to calculate the percentage of HFiles cached for each region on the hosting server. This information is then used by the balancer as another factor when deciding on an optimal, new assignment plan.


link:https://issues.apache.org/jira/browse/HBASE-27999[HBASE-27999] implements the prefetch aware load balancer which is aimed at enhancing the capability of HBase to enable the balancer to consider the cache allocation of each region on region servers when calculating a new assignment plan and use the region/region server cache allocation information reported by region servers to calculate the percentage of HFiles cached for each region on the hosting server, and then use that as another factor when deciding on an optimal, new assignment plan.

The master node captures the prefetch information from all the region servers and uses this information to decide the region assignments while ensuring a minimal impact on the warmed up cache. A region is assigned to the region server where it has a better cache ratio as compared to the region server where it is currently hosted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

The master node captures the caching information from all the region servers and uses this information to decide on new region assignments while ensuring a minimal impact on the current cache allocation. A region is assigned to the region server where it has a better cache ratio as compared to the region server where it is currently hosted.

. Cache Cost
+

The cache cost is calculated as the percentage of data for a region cached on the region server where it is either currently hosted or was previously hosted. A region may have multiple HFiles, each of different sizes. A HFile is considered to be fully prefetched when all the data blocks in this file are in the cache. The region server hosting this region calculates the ratio of number of HFiles cached in the bucket cache to the total number of HFiles in the region. This ratio will vary from 0 (region hosted on this server, but none of its HFiles are cached into the bucket cache) to 1 (region hosted on this server and all the HFiles for this region are cached into the bucket cache).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace prefetched by cached.

replace bucket cache by cache.

Comment on lines 1150 to 1152
. Skewness Cost
+
The skewness cost is calculated as the number of regions hosted on each region server in the cluster. The skewness cost varies from 0 (regions are equally distributed across the region servers) to 1 (regions are not equally distributed across the region servers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip the explanation about skewness. In the next line, just mention this balancer implementation will combine the cache cost with skewness to decide on the assignment plan.


. When the cost of maintaining the balance in the cluster is greater than the minimum threshold defined by the configuration _hbase.master.balancer.stochastic.minCostNeedBalance_.

The cluster can be made to use the CacheAwareLoadBalancer by setting the following configuration properties:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Enable the CacheAwareLoadBalancer by setting the following configuration properties in the master configuration:

@ragarkar
Copy link
Contributor Author

ragarkar commented Nov 2, 2023

Hi @wchevreuil , thanks for providing the feedback on the changes. I have addressed these comments in the latest patch. Please take a look. Thanks.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ HBASE-27389 Compile Tests _
_ Patch Compile Tests _
_ Other Tests _
1m 14s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5495
Optional Tests
uname Linux dcf4671afc0d 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-27389 / 5f7df54
Max. process+thread count 33 (vs. ulimit of 30000)
modules C: . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 31s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ HBASE-27389 Compile Tests _
_ Patch Compile Tests _
_ Other Tests _
1m 23s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #5495
Optional Tests
uname Linux 3ddae8448665 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-27389 / 5f7df54
Max. process+thread count 29 (vs. ulimit of 30000)
modules C: . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ HBASE-27389 Compile Tests _
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 spotless 0m 38s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
3m 12s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5495
Optional Tests dupname asflicense spotless
uname Linux 2cccff26bdb4 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-27389 / 5f7df54
Max. process+thread count 45 (vs. ulimit of 30000)
modules C: . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5495/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@wchevreuil wchevreuil merged commit 69d980a into apache:HBASE-27389 Nov 2, 2023
wchevreuil pushed a commit that referenced this pull request Nov 10, 2023
#5495)

Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
wchevreuil pushed a commit that referenced this pull request Nov 13, 2023
#5495)

Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
wchevreuil pushed a commit that referenced this pull request Nov 22, 2023
#5495)

Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants