-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27264 Add options to consider compressed size when delimiting blocks during hfile writes #4675
HBASE-27264 Add options to consider compressed size when delimiting blocks during hfile writes #4675
Changes from 3 commits
ebdc565
1ff2efa
17171a3
a328d9b
694631c
6918d65
4d67df3
b9e191f
80b95f3
5a1c552
4136f57
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,9 @@ | |
*/ | ||
package org.apache.hadoop.hbase.io.hfile; | ||
|
||
import static org.apache.hadoop.hbase.io.hfile.HFileBlock.BLOCK_SIZE_LIMIT_COMPRESSED; | ||
import static org.apache.hadoop.hbase.io.hfile.HFileBlock.MAX_BLOCK_SIZE_COMPRESSED; | ||
|
||
import java.io.DataOutput; | ||
import java.io.DataOutputStream; | ||
import java.io.IOException; | ||
|
@@ -291,8 +294,9 @@ protected void finishInit(final Configuration conf) { | |
if (blockWriter != null) { | ||
throw new IllegalStateException("finishInit called twice"); | ||
} | ||
blockWriter = | ||
new HFileBlock.Writer(conf, blockEncoder, hFileContext, cacheConf.getByteBuffAllocator()); | ||
blockWriter = new HFileBlock.Writer(conf, blockEncoder, hFileContext, | ||
cacheConf.getByteBuffAllocator(), conf.getBoolean(BLOCK_SIZE_LIMIT_COMPRESSED, false), | ||
conf.getInt(MAX_BLOCK_SIZE_COMPRESSED, hFileContext.getBlocksize() * 10)); | ||
// Data block index writer | ||
boolean cacheIndexesOnWrite = cacheConf.shouldCacheIndexesOnWrite(); | ||
dataBlockIndexWriter = new HFileBlockIndex.BlockIndexWriter(blockWriter, | ||
|
@@ -319,6 +323,9 @@ protected void checkBlockBoundary() throws IOException { | |
shouldFinishBlock = blockWriter.encodedBlockSizeWritten() >= hFileContext.getBlocksize() | ||
|| blockWriter.blockSizeWritten() >= hFileContext.getBlocksize(); | ||
} | ||
if (blockWriter.isSizeLimitCompressed()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's change this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We actually want to enter here if And in the case where |
||
shouldFinishBlock &= blockWriter.shouldFinishBlock(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. related to the above comment by @bbeaudreault in what situation the shouldFinishBlock = true and blockWriter.shouldFinishBlock() = false ? is it possible? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, shouldFinishBlock could be true at this point because so far here we just checked "raw" uncompressed size or encoded uncompressed against BLOCK_SIZE. It is possible that these sizes are higher than BLOCK_SIZE, but the compressed size might still be less than the BLOCK_SIZE. |
||
} | ||
if (shouldFinishBlock) { | ||
finishBlock(); | ||
writeInlineBlocks(false); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to have this logic here vs in HFileWriteRImpl with the rest of the
shouldfinish
logic?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method involves dealing with some block specifics, like compression, the block content byte array buffer and what to do with compression size when deciding what should be a block limit. Moving it to HFileWriteRImpl would spill some block specific variables and logic into the file writer logic. It just feels to me, putting it here is more cohesive.