Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-23066 Allow cache on write during compactions when prefetching … #707

Closed
wants to merge 1 commit into from

Conversation

jacob-leblanc
Copy link
Contributor

…is enabled

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
💙 reexec 0m 37s Docker mode activated.
_ Prechecks _
💚 dupname 0m 0s No case conflicting files found.
💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
💚 @author 0m 0s The patch does not contain any @author tags.
💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
💚 mvninstall 5m 35s master passed
💚 compile 0m 56s master passed
💚 checkstyle 1m 20s master passed
💚 shadedjars 4m 34s branch has no errors when building our shaded downstream artifacts.
💚 javadoc 0m 38s master passed
💙 spotbugs 4m 3s Used deprecated FindBugs config; considering switching to SpotBugs.
💚 findbugs 4m 0s master passed
_ Patch Compile Tests _
💚 mvninstall 4m 49s the patch passed
💚 compile 0m 56s the patch passed
💚 javac 0m 56s the patch passed
💚 checkstyle 1m 20s the patch passed
💚 whitespace 0m 0s The patch has no whitespace issues.
💚 shadedjars 4m 34s patch has no errors when building our shaded downstream artifacts.
💚 hadoopcheck 15m 48s Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
💚 javadoc 0m 36s the patch passed
💚 findbugs 4m 3s the patch passed
_ Other Tests _
💚 unit 159m 57s hbase-server in the patch passed.
💚 asflicense 0m 35s The patch does not generate ASF License warnings.
216m 42s
Subsystem Report/Notes
Docker Client=19.03.3 Server=19.03.3 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-707/1/artifact/out/Dockerfile
GITHUB PR #707
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname Linux fb96839beab8 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-707/out/precommit/personality/provided.sh
git revision master / ba12d5b
Default Java 1.8.0_181
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-707/1/testReport/
Max. process+thread count 5023 (vs. ulimit of 10000)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-707/1/console
versions git=2.11.0 maven=2018-06-17T18:33:14Z) findbugs=3.1.11
Powered by Apache Yetus 0.11.0 https://yetus.apache.org

This message was automatically generated.

@ramkrish86
Copy link
Contributor

LGTM. Let's wait for some time for others to review.

* Configuration key to cache blocks when a compacted file is written, predicated on prefetching
* being enabled for the column family.
*/
public static final String PREFETCH_COMPACTED_BLOCKS_ON_WRITE_KEY =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confusing.. Are we doing the prefetch of the new compacted file once it is written?
Dont think so.. When we write the file, that time itself the caching happens. So it is cache on write. Why its called prefetch then? There is no extra fetch op happening right?

* @return true if blocks should be cached while writing during compaction, false if not
*/
public boolean shouldCacheCompactedBlocksOnWrite() {
return this.prefetchCompactedDataOnWrite && this.prefetchOnOpen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. So the cache on write (at compaction) happens iff prefetch config is ON ! Anyways in ur case the prefetch which is another config, is ON right? I think this is the reason why the new config you have named that way. But some how I feel that config name is bit misleading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the cache size should be much bigger than the hot data set size if u want to do cache on compact. Because the compacted away data might be already in cache (Those are flused files or a result of another compaction). Those are recently been accessed also (by the compaction thread). This feature should be very carefully used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this. My understanding is that in cases where prefetch is enabled, the new file is going to be read into the cache after compaction completes anyway. So the cache size requirements are the same when this new setting is enabled. This is why I wanted to limit the scope of the cache on write to only apply where prefetching is enabled: it simply is a way to do the cache loading more efficiently while we are writing the data out rather than having to read it back after compaction is done which I've found is very expensive when data is in S3.

As far as the name goes, I struggled to come up with something intuitive - how do I explain in the name alone that this only applies when prefetching is on? I tried to convey "when prefetching, do the prefetch of compacted data on write." I'm not in love with the name and I'm open to suggestions. I didn't want to give the false impression that all compacted data is going to be cached on write. Maybe "cacheCompactedDataOnWriteIfPrefetching"? Is that too wordy?

@busbey
Copy link
Contributor

busbey commented Dec 11, 2019

is this made obsolete by #919?

@virajjasani
Copy link
Contributor

is this made obsolete by #919?

I believe so. Closing this PR @jacob-leblanc @ramkrish86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants