-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27225 Add BucketAllocator bucket size statistic logging #4637
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea @bbeaudreault , had a question regarding the math.
And maybe for a follow-on work, could we also publish JMX metrics for it? I guess that would make it easier for analysing which would be the ideal block size than having to go though debug logs.
// if bucket capacity is not perfectly divisible by a bucket's object size, there will | ||
// be some left over per bucket. for some object sizes this may be large enough to be | ||
// non-trivial and worth tuning by choosing a more divisible object size. | ||
long waistedBytes = (bucketCapacity % bucketObjectSize) * (full + fillingBuckets); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind teach me briefly on this math here? Aren't bucketCapacity
and bucketObjectSize
computed for each individual bucket? If so, why are we multiplying by the number of buckets that already have some data? Wouldn't this (bucketCapacity % bucketObjectSize) differ for each bucket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to, and feel free to tell me if I'm misunderstanding here. I'm also learning the BucketCache recently.
bucketCapacity
is calculated once for the whole cache. It's defined as4 * largestBucketSize
.- Then, each bucket is allocated to one of a configured number of bucket sizes
- The configured bucket sizes may not divide into the global
bucketCapacity
well, leaving a remainder. - That size of that remainder will vary for each bucket size, and any bucket allocated to that size will have that much waisted space.
- Typically each bucket size will have 1
freeBucket
(which is probably better defined as afillingBucket
) and a number offullBuckets
. A fullBucket isn't actually full, depending on the block size it's allocated to. The remainder for the block size will be empty/unused. - That remainder should be the same for all buckets of a particular size, because the remainder is based on the configured bucket size.
Let's use my example snippet from above:
Bucket allocator statistics follow:
Free bytes=5325249536; used bytes=35391673344; total bytes=40716922880; waisted bytes=355258368; completelyFreeBuckets=843
Object size 33792; used=195000; free=114; total=195114; waisted bytes=10741760; full buckets=1048
Object size 66560; used=275656; free=46; total=275702; waisted bytes=114128896; full buckets=2932
Object size 99328; used=76785; free=12; total=76797; waisted bytes=46185472; full buckets=1218
Object size 132096; used=6090; free=20; total=6110; waisted bytes=11315200; full buckets=129
Object size 525312; used=3303; free=8; total=3311; waisted bytes=155653120; full buckets=300
Object size 787456; used=152; free=2; total=154; waisted bytes=17233920; full buckets=21
Object size 1573888; used=107; free=3373; total=3480; waisted bytes=0; full buckets=26
- I've noticed across a bunch of hosts that the 525312 bucket has the most waist based on this configuration.
- The largest bucket size is 157388, so the bucketCapacity is 6295552
- So using 525312 as an example, it will always allocate blocks in those buckets with a size of 525312, even if the actual block size isn't exactly that amount [*].
6295552 % 525312 = 517120
. So given that we always allocate blocks with size 525312, we basically are missing out on ~1 block per bucket allocated for this size. There are only approx6295552 / 525312 = 11
blocks fitting into this bucket size, so that almost 10% waist.- Checking 66560, which is next most waistful but has many more buckets.
6295552 % 66560 = 38912
. - So each bucket will waist about half a block. For that block size, it can fit 94 blocks per bucket. So waisting half a block is much more efficient than above.
Buckets can be reallocated over time. Once they become a completelyFreeBucket
, the next allocation that needs one, of any size, will take that bucket and reconfigure it for its purpose. At that point the amount of waist for that bucket would change based on the new block size.
[*] getting back to "even if the actual block size isn't exactly that amount". This speaks to another source of waist which is harder to calculate. So the waistBytes is actually very much an underestimation, but I'm not sure by how much. Let's say you have a block with 150k size. That doesn't fit into the 129k bucket, so it has to go into the 513k bucket. But even though the block size is 150k, we need to allocate a full 513k, leaving 363k of waisted space.
I say this is harder to calculate, but of course it'd be relatively easy. When we call roundUpToBucketSizeInfo
we could just subtract the blockSize from the bucketSize, and then add that diff to a histogram. But I was just thinking this might be a very hot codepath and adding a histogram there will be a lot more expensive than what I currently have since currently the statistics are calculated totally off the hot path.
I imagine maybe the unified.encoded.blocksize could help with this problem, but also part of the reason for so much upward skew for us is because we have some users writing large rows. So may not entirely help us at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So using 525312 as an example, it will always allocate blocks in those buckets with a size of 525312, even if the actual block size isn't exactly that amount [*].
6295552 % 525312 = 517120. So given that we always allocate blocks with size 525312, we basically are missing out on ~1 block per bucket allocated for this size. There are only approx 6295552 / 525312 = 11 blocks fitting into this bucket size, so that almost 10% waist.
Ok, so this waste here is a best case scenario, not taking into account the real size of allocated blocks on each bucket? Considering that real world cases may not have perfectly sized 512KB blocks, fragmentation would be even higher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, correct. fragmentation could make this worse, and its harder to cheaply calculate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have ideas, i'd be happy to try adding that. One thought I had was to add a LongAdder to each BucketSizeInfo, and increment it in allocateBlock(int blockSize)
. But really we should also be decrementing somewhere. Possibly in freeBlock
, but we'd need to add the blockSize as an argument there. And there's a question of the impact on performance (probably small/worth it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just implemented the above in my latest patch. I'm thinking a LongAdder increment/decrement is probably not going to make a noticeable difference next to everything else.
Here's an example output from the test, unfortunately I don't have a real-world example:
Free bytes=20314112; used bytes=10435584; total bytes=30749696; wasted bytes=70656; fragmentation bytes=728064; completelyFreeBuckets=10
Object size 5120; used=410; free=0; total=410; wasted bytes=2048; fragmentation bytes=419840, full buckets=1
Object size 9216; used=228; free=0; total=228; wasted bytes=0; fragmentation bytes=233472, full buckets=1
Object size 17408; used=0; free=120; total=120; wasted bytes=0; fragmentation bytes=0, full buckets=0
Object size 33792; used=0; free=62; total=62; wasted bytes=0; fragmentation bytes=0, full buckets=0
Object size 41984; used=0; free=50; total=50; wasted bytes=0; fragmentation bytes=0, full buckets=0
Object size 50176; used=0; free=41; total=41; wasted bytes=0; fragmentation bytes=0, full buckets=0
Object size 58368; used=0; free=36; total=36; wasted bytes=0; fragmentation bytes=0, full buckets=0
Object size 66560; used=31; free=0; total=31; wasted bytes=37888; fragmentation bytes=31744, full buckets=1
Object size 99328; used=42; free=0; total=42; wasted bytes=30720; fragmentation bytes=43008, full buckets=2
Object size 132096; used=0; free=15; total=15; wasted bytes=0; fragmentation bytes=0, full buckets=0
Object size 197632; used=0; free=10; total=10; wasted bytes=0; fragmentation bytes=0, full buckets=0
Object size 263168; used=0; free=7; total=7; wasted bytes=0; fragmentation bytes=0, full buckets=0
Object size 394240; used=0; free=5; total=5; wasted bytes=0; fragmentation bytes=0, full buckets=0
Object size 525312; used=0; free=4; total=4; wasted bytes=0; fragmentation bytes=0, full buckets=0
We could combine wasted + fragmented. Probably confusing to have both, but also useful once you understand the difference (which we can document)
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation and for the addition of the fragmentation measure.
LGTM, +1!
🎊 +1 overall
This message was automatically generated. |
2a1f3fa
to
a930b74
Compare
Thanks @wchevreuil, can you give it one more look? I realized LongAdder is not necessary because both increment/decrement are behind synchronization. Changed it to a normal long. I also added a bunch of javadoc to help people understand what they're looking at. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
…ic logging (apache#4637) (addendum) Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Example output (with log4j log formatting prefixes removed for clarity):