-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-23066 Allow cache on write during compactions when prefetching … #707
Conversation
🎊 +1 overall
This message was automatically generated. |
LGTM. Let's wait for some time for others to review. |
* Configuration key to cache blocks when a compacted file is written, predicated on prefetching | ||
* being enabled for the column family. | ||
*/ | ||
public static final String PREFETCH_COMPACTED_BLOCKS_ON_WRITE_KEY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit confusing.. Are we doing the prefetch of the new compacted file once it is written?
Dont think so.. When we write the file, that time itself the caching happens. So it is cache on write. Why its called prefetch then? There is no extra fetch op happening right?
* @return true if blocks should be cached while writing during compaction, false if not | ||
*/ | ||
public boolean shouldCacheCompactedBlocksOnWrite() { | ||
return this.prefetchCompactedDataOnWrite && this.prefetchOnOpen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh.. So the cache on write (at compaction) happens iff prefetch config is ON ! Anyways in ur case the prefetch which is another config, is ON right? I think this is the reason why the new config you have named that way. But some how I feel that config name is bit misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the cache size should be much bigger than the hot data set size if u want to do cache on compact. Because the compacted away data might be already in cache (Those are flused files or a result of another compaction). Those are recently been accessed also (by the compaction thread). This feature should be very carefully used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this. My understanding is that in cases where prefetch is enabled, the new file is going to be read into the cache after compaction completes anyway. So the cache size requirements are the same when this new setting is enabled. This is why I wanted to limit the scope of the cache on write to only apply where prefetching is enabled: it simply is a way to do the cache loading more efficiently while we are writing the data out rather than having to read it back after compaction is done which I've found is very expensive when data is in S3.
As far as the name goes, I struggled to come up with something intuitive - how do I explain in the name alone that this only applies when prefetching is on? I tried to convey "when prefetching, do the prefetch of compacted data on write." I'm not in love with the name and I'm open to suggestions. I didn't want to give the false impression that all compacted data is going to be cached on write. Maybe "cacheCompactedDataOnWriteIfPrefetching"? Is that too wordy?
is this made obsolete by #919? |
I believe so. Closing this PR @jacob-leblanc @ramkrish86 |
…is enabled