HBASE-23066 Allow cache on write during compactions when prefetching … #707

jacob-leblanc · 2019-10-09T23:59:31Z

…is enabled

Apache-HBase · 2019-10-10T03:46:54Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
💙	reexec	0m 37s	Docker mode activated.
		_ Prechecks _
💚	dupname	0m 0s	No case conflicting files found.
💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
💚	@author	0m 0s	The patch does not contain any @author tags.
💚	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
		_ master Compile Tests _
💚	mvninstall	5m 35s	master passed
💚	compile	0m 56s	master passed
💚	checkstyle	1m 20s	master passed
💚	shadedjars	4m 34s	branch has no errors when building our shaded downstream artifacts.
💚	javadoc	0m 38s	master passed
💙	spotbugs	4m 3s	Used deprecated FindBugs config; considering switching to SpotBugs.
💚	findbugs	4m 0s	master passed
		_ Patch Compile Tests _
💚	mvninstall	4m 49s	the patch passed
💚	compile	0m 56s	the patch passed
💚	javac	0m 56s	the patch passed
💚	checkstyle	1m 20s	the patch passed
💚	whitespace	0m 0s	The patch has no whitespace issues.
💚	shadedjars	4m 34s	patch has no errors when building our shaded downstream artifacts.
💚	hadoopcheck	15m 48s	Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2.
💚	javadoc	0m 36s	the patch passed
💚	findbugs	4m 3s	the patch passed
		_ Other Tests _
💚	unit	159m 57s	hbase-server in the patch passed.
💚	asflicense	0m 35s	The patch does not generate ASF License warnings.
		216m 42s

Subsystem	Report/Notes
Docker	Client=19.03.3 Server=19.03.3 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-707/1/artifact/out/Dockerfile
GITHUB PR	#707
Optional Tests	dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname	Linux fb96839beab8 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-707/out/precommit/personality/provided.sh
git revision	master / `ba12d5b`
Default Java	1.8.0_181
Test Results	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-707/1/testReport/
Max. process+thread count	5023 (vs. ulimit of 10000)
modules	C: hbase-server U: hbase-server
Console output	https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-707/1/console
versions	git=2.11.0 maven=2018-06-17T18:33:14Z) findbugs=3.1.11
Powered by	Apache Yetus 0.11.0 https://yetus.apache.org

This message was automatically generated.

ramkrish86 · 2019-10-14T12:30:03Z

LGTM. Let's wait for some time for others to review.

anoopsjohn · 2019-10-15T04:24:27Z

hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java

+   * Configuration key to cache blocks when a compacted file is written, predicated on prefetching
+   * being enabled for the column family.
+   */
+  public static final String PREFETCH_COMPACTED_BLOCKS_ON_WRITE_KEY =


A bit confusing.. Are we doing the prefetch of the new compacted file once it is written?
Dont think so.. When we write the file, that time itself the caching happens. So it is cache on write. Why its called prefetch then? There is no extra fetch op happening right?

anoopsjohn · 2019-10-15T04:26:20Z

hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java

+   * @return true if blocks should be cached while writing during compaction, false if not
+   */
+  public boolean shouldCacheCompactedBlocksOnWrite() {
+    return this.prefetchCompactedDataOnWrite && this.prefetchOnOpen;


Oh.. So the cache on write (at compaction) happens iff prefetch config is ON ! Anyways in ur case the prefetch which is another config, is ON right? I think this is the reason why the new config you have named that way. But some how I feel that config name is bit misleading.

Actually the cache size should be much bigger than the hot data set size if u want to do cache on compact. Because the compacted away data might be already in cache (Those are flused files or a result of another compaction). Those are recently been accessed also (by the compaction thread). This feature should be very carefully used.

Thanks for looking at this. My understanding is that in cases where prefetch is enabled, the new file is going to be read into the cache after compaction completes anyway. So the cache size requirements are the same when this new setting is enabled. This is why I wanted to limit the scope of the cache on write to only apply where prefetching is enabled: it simply is a way to do the cache loading more efficiently while we are writing the data out rather than having to read it back after compaction is done which I've found is very expensive when data is in S3.

As far as the name goes, I struggled to come up with something intuitive - how do I explain in the name alone that this only applies when prefetching is on? I tried to convey "when prefetching, do the prefetch of compacted data on write." I'm not in love with the name and I'm open to suggestions. I didn't want to give the false impression that all compacted data is going to be cached on write. Maybe "cacheCompactedDataOnWriteIfPrefetching"? Is that too wordy?

busbey · 2019-12-11T16:11:01Z

is this made obsolete by #919?

virajjasani · 2020-02-23T19:07:06Z

is this made obsolete by #919?

I believe so. Closing this PR @jacob-leblanc @ramkrish86

HBASE-23066 Allow cache on write during compactions when prefetching …

43f3f19

…is enabled

anoopsjohn reviewed Oct 15, 2019

View reviewed changes

virajjasani closed this Feb 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-23066 Allow cache on write during compactions when prefetching … #707

HBASE-23066 Allow cache on write during compactions when prefetching … #707

jacob-leblanc commented Oct 9, 2019

Apache-HBase commented Oct 10, 2019

ramkrish86 commented Oct 14, 2019

anoopsjohn Oct 15, 2019

anoopsjohn Oct 15, 2019

anoopsjohn Oct 15, 2019

jacob-leblanc Oct 16, 2019

busbey commented Dec 11, 2019

virajjasani commented Feb 23, 2020

HBASE-23066 Allow cache on write during compactions when prefetching … #707

HBASE-23066 Allow cache on write during compactions when prefetching … #707

Conversation

jacob-leblanc commented Oct 9, 2019

Apache-HBase commented Oct 10, 2019

ramkrish86 commented Oct 14, 2019

anoopsjohn Oct 15, 2019

Choose a reason for hiding this comment

anoopsjohn Oct 15, 2019

Choose a reason for hiding this comment

anoopsjohn Oct 15, 2019

Choose a reason for hiding this comment

jacob-leblanc Oct 16, 2019

Choose a reason for hiding this comment

busbey commented Dec 11, 2019

virajjasani commented Feb 23, 2020