[Draft] Improve usage of blob store cache during searchable snapshots shard recovery #69283

tlrx · 2021-02-19T15:13:35Z

The blob store cache was introduced in #60522 to speed up searchable snapshots shard recovery by caching (in a system index) the first 4096 bytes, sometimes 8192, of every Lucene files that compose a shard.

Recent experiments using large snapshots suggest that we could maybe adjust the current caching strategy by caching less data (ie 1024 bytes) by default for most of the files and cache more data (up to 64KB) for Lucene metadata files.

This draft pull request addresses this point by introducing a BlobStoreCacheService#computeHeaderByteRange() that computes the range of bytes to put in blob store cache depending of the Lucene file type.

We also noticed that compound files could represent a non negligeable amount of the total size of a shard (~30% in our tests) and that it may be worth to avoid random seeks and reads by also caching the files that compose .cfs files.

This pull request addresses this point by caching headers and footers of .cfs inner files in the blob store cache. The size of the data to cache for the header is computed using computeHeaderByteRange(). The footer is 16 bytes long.

Finally, we found that concurrent prewarming and directory opening could prevent some file parts to be effectively cached in the blob store cache the first time an index is mounted, forcing some bytes to be redownloaded again the next times that index will be mounted.

This pull request addresses this point by detecting when using the blob store cache index should be preferred rather than using the disk based cache. Blob store cache is always preferred when the recovery is not finalized yet, and completely bypassed when the recovery is done.

I'm opening this PR as a draft to show the complexity introduced by this change. It's possible that we decide to move forward with only a subset of the changes.

tlrx · 2021-02-19T15:15:17Z

...hots/src/internalClusterTest/java/org/apache/lucene/codecs/lucene50/CompoundReaderUtils.java

+
+    private CompoundReaderUtils() {}
+
+    public static Map<String, Map<String, Tuple<Long, Long>>> extractCompoundFiles(Directory directory) throws IOException {


I'm really sorry but I did not find any better way to extract the list of files that composed the CFS. Reading only the .cfe is possible but that won't give the inner offsets (only the lengths) and I think it is better to check the right boundaries.

tlrx · 2021-02-19T15:17:56Z

...Test/java/org/elasticsearch/blobstore/cache/SearchableSnapshotsBlobStoreCacheIntegTests.java

@@ -205,33 +250,36 @@ public void testBlobStoreCache() throws Exception {

        assertAcked(client().admin().indices().prepareDelete(restoredIndex));

-        logger.info("--> mount snapshot [{}] as an index for the second time", snapshot);
-        final String restoredAgainIndex = mountSnapshot(
+        cacheEnabled = randomBoolean();


The second time the index is mounted can now be fully randomized between full cache/partial cache/no cache.

tlrx · 2021-02-19T15:18:38Z

...Test/java/org/elasticsearch/blobstore/cache/SearchableSnapshotsBlobStoreCacheIntegTests.java

-                    || mayReadMoreThanHeader == false) {
-                    assertThat(Strings.toString(indexInputStats), indexInputStats.getBlobStoreBytesRequested().getCount(), equalTo(0L));
-                }
+                assertThat(Strings.toString(indexInputStats), indexInputStats.getBlobStoreBytesRequested().getCount(), equalTo(0L));


We can now blindly assume in this test that no bytes where requested when mounting the second time.

Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from #69283 and does not address the caching of CFS files.

…ic#69431) Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from elastic#69283 and does not address the caching of CFS files. Backport of elastic#69431

Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from #69283 and does not address the caching of CFS files. Backport of #69431

tlrx · 2021-03-23T09:45:14Z

Part of this draft pull request have been implemented and merged (#69861, #69415, #68902, #69431).

tlrx added 2 commits February 19, 2021 15:48

Improve usage of blob store cache during recovery

867e01a

randomize prewarm

4b892d3

tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 labels Feb 19, 2021

tlrx commented Feb 19, 2021

View reviewed changes

ywelsch self-requested a review February 19, 2021 16:08

tlrx mentioned this pull request Feb 23, 2021

Adjust the length of blob cache docs for Lucene metadata files #69431

Merged

tlrx mentioned this pull request Mar 1, 2021

Adjust the length of blob cache docs for Lucene metadata files #69691

Merged

tlrx mentioned this pull request Mar 1, 2021

Adjust the length of blob cache docs for Lucene metadata files #69692

Merged

tlrx closed this Mar 23, 2021

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Improve usage of blob store cache during searchable snapshots shard recovery #69283

[Draft] Improve usage of blob store cache during searchable snapshots shard recovery #69283

tlrx commented Feb 19, 2021

tlrx Feb 19, 2021

tlrx Feb 19, 2021

tlrx Feb 19, 2021

tlrx commented Mar 23, 2021 •

edited

Loading


		private CompoundReaderUtils() {}

		public static Map<String, Map<String, Tuple<Long, Long>>> extractCompoundFiles(Directory directory) throws IOException {

[Draft] Improve usage of blob store cache during searchable snapshots shard recovery #69283

[Draft] Improve usage of blob store cache during searchable snapshots shard recovery #69283

Conversation

tlrx commented Feb 19, 2021

tlrx Feb 19, 2021

Choose a reason for hiding this comment

tlrx Feb 19, 2021

Choose a reason for hiding this comment

tlrx Feb 19, 2021

Choose a reason for hiding this comment

tlrx commented Mar 23, 2021 • edited Loading

tlrx commented Mar 23, 2021 •

edited

Loading