Use blob store cache for Lucene compound files #69861

tlrx · 2021-03-03T11:55:47Z

The blob store cache is used to cache a variable length of the begining of Lucene files in the .snapshot-blob-cache system index. This is useful to speed up Lucene directory opening during shard recovery and to limit the number of bytes downloaded from the blob store when a searchable snapshot shard must be rebuilt.

This pull request adds support for compound files segment (.cfs) when they are partially cached (ie, Storage.SHARED_CACHE) so that the files they are composed of can also be cached in the blob store cache index.

Co-authored-by: Yannick Welsch yannick@welsch.lu

elasticmachine · 2021-03-03T11:55:51Z

Pinging @elastic/es-distributed (Team:Distributed)

tlrx · 2021-03-03T11:58:26Z

...Test/java/org/elasticsearch/blobstore/cache/SearchableSnapshotsBlobStoreCacheIntegTests.java

-        return Map.copyOf(blobsPerShard);
-    }
-
-    private void assertCachedBlobsInSystemIndex(final String repositoryName, final Map<String, BlobStoreIndexShardSnapshot> blobsInSnapshot)


In a follow up I'll add tests to verify the exact cached documents for CFS and non-CFS Lucene files, but for this integration test I think it is sufficient to verify that no bytes were downloaded after the second mount.

tlrx · 2021-03-03T11:59:07Z

test/framework/src/main/java/org/elasticsearch/common/lucene/store/ESIndexInputTestCase.java

@@ -121,7 +121,7 @@ protected void doRun() throws Exception {
                                    clone = indexInput.clone();
                                } else {
                                    final int sliceEnd = between(readEnd, length);
-                                    clone = indexInput.slice("concurrent slice (0, " + sliceEnd + ") of " + indexInput, 0L, sliceEnd);
+                                    clone = indexInput.slice("concurrent slice" + randomFileExtension(), 0L, sliceEnd);


this is to better reflect what Lucene is doing when slicing a CFS file

tlrx · 2021-03-03T12:00:24Z

...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java

        final int length = b.remaining();

        logger.trace("readInternal: read [{}-{}] ([{}] bytes) from [{}]", position, position + length, length, this);
        try {
            final CacheFile cacheFile = cacheFileReference.get();

            // Can we serve the read directly from disk? If so, do so and don't worry about anything else.
+            if (isReadFromCompoundFileDuringRecovery(position, length) == false) {


This is to "force" the creation of cached blob docs during directory opening.

tlrx · 2021-03-04T16:00:00Z

Build failure in elasticsearch-ci/1 is tracked in #69980. I'm waiting for this test to be muted and I'll rerun the tests.

martijnvg · 2021-03-04T16:01:28Z

@tlrx I just muted it.

ywelsch

I've left one comment, o.w. looking good.

ywelsch · 2021-03-04T16:18:36Z

...Test/java/org/elasticsearch/blobstore/cache/SearchableSnapshotsBlobStoreCacheIntegTests.java

        expectThrows(
            IndexNotFoundException.class,
            ".snapshot-blob-cache system index should not be created yet",
            () -> systemClient().admin().indices().prepareGetIndex().addIndices(SNAPSHOT_BLOB_CACHE_INDEX).get()
        );

-        Storage storage = randomFrom(Storage.values());
+        // TODO randomize this with FULL_COPY too when cold tier also handle blob cache for footers
+        final Storage storage = Storage.SHARED_CACHE;


I prefer that we keep the cold storage variant here as well, and continue to assert about cfs further down below.

Makes sense, I pusched 77828c9

ywelsch

LGTM

tlrx · 2021-03-04T18:02:38Z

Thanks for your help on this Yannick

The blob store cache is used to cache a variable length of the begining of Lucene files in the .snapshot-blob-cache system index. This is useful to speed up Lucene directory opening during shard recovery and to limit the number of bytes downloaded from the blob store when a searchable snapshot shard must be rebuilt. This commit adds support for compound files segment (.cfs) when they are partially cached (ie, Storage.SHARED_CACHE) so that the files they are composed of can also be cached in the blob store cache index. Co-Authored-By: Yannick Welsch <yannick@welsch.lu>

The blob store cache is used to cache a variable length of the begining of Lucene files in the .snapshot-blob-cache system index. This is useful to speed up Lucene directory opening during shard recovery and to limit the number of bytes downloaded from the blob store when a searchable snapshot shard must be rebuilt. This commit adds support for compound files segment (.cfs) when they are partially cached (ie, Storage.SHARED_CACHE) so that the files they are composed of can also be cached in the blob store cache index. Co-Authored-By: Yannick Welsch <yannick@welsch.lu> Backport of #69861 for 7.12

The blob store cache is used to cache a variable length of the begining of Lucene files in the .snapshot-blob-cache system index. This is useful to speed up Lucene directory opening during shard recovery and to limit the number of bytes downloaded from the blob store when a searchable snapshot shard must be rebuilt. This commit adds support for compound files segment (.cfs) when they are partially cached (ie, Storage.SHARED_CACHE) so that the files they are composed of can also be cached in the blob store cache index. Co-Authored-By: Yannick Welsch <yannick@welsch.lu> Backport of #69861 for 7.x

…atsTests (#70006) Since #69861 CFS files read from FrozenIndexInput create dedicated frozen shared cache files when they are sliced. This does not play well with some tests that use the randomReadAndSlice to read files: this method can create overlapping slice/clone reads operations which makes it difficult to assert anything about CFS files with partial cache. This commit prevent the tests to generate a .cfs file name when the partial cache is randomly picked up. As a follow up we should rework those test to make them more realistic with the new behavior. Closes #70000

…atsTests (elastic#70006) Since elastic#69861 CFS files read from FrozenIndexInput create dedicated frozen shared cache files when they are sliced. This does not play well with some tests that use the randomReadAndSlice to read files: this method can create overlapping slice/clone reads operations which makes it difficult to assert anything about CFS files with partial cache. This commit prevent the tests to generate a .cfs file name when the partial cache is randomly picked up. As a follow up we should rework those test to make them more realistic with the new behavior. Closes elastic#70000

…atsTests (#70006) (#70019) Since #69861 CFS files read from FrozenIndexInput create dedicated frozen shared cache files when they are sliced. This does not play well with some tests that use the randomReadAndSlice to read files: this method can create overlapping slice/clone reads operations which makes it difficult to assert anything about CFS files with partial cache. This commit prevent the tests to generate a .cfs file name when the partial cache is randomly picked up. As a follow up we should rework those test to make them more realistic with the new behavior. Closes #70000

…atsTests (#70006) (#70018) Since #69861 CFS files read from FrozenIndexInput create dedicated frozen shared cache files when they are sliced. This does not play well with some tests that use the randomReadAndSlice to read files: this method can create overlapping slice/clone reads operations which makes it difficult to assert anything about CFS files with partial cache. This commit prevent the tests to generate a .cfs file name when the partial cache is randomly picked up. As a follow up we should rework those test to make them more realistic with the new behavior. Closes #70000

Use blob store cache for Lucene compound files

5e7547d

tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.12.0 v7.13.0 labels Mar 3, 2021

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 3, 2021

tlrx requested a review from ywelsch March 3, 2021 11:55

tlrx commented Mar 3, 2021

View reviewed changes

tlrx added 4 commits March 4, 2021 11:43

current state of work - not working

ab38952

fix battle

7018845

apply fix

8397317

Merge branch 'master' into cache-cfs-in-blob-cache

ad039d0

Merge branch 'master' into cache-cfs-in-blob-cache

c56e4e0

ywelsch reviewed Mar 4, 2021

View reviewed changes

ywelsch and others added 2 commits March 4, 2021 17:52

avoid remounting 3rd time

77696aa

nits

77828c9

tlrx requested a review from ywelsch March 4, 2021 17:18

ywelsch approved these changes Mar 4, 2021

View reviewed changes

tlrx merged commit 0cf97f7 into elastic:master Mar 4, 2021

tlrx deleted the cache-cfs-in-blob-cache branch March 4, 2021 18:02

tlrx mentioned this pull request Mar 4, 2021

Use blob store cache for Lucene compound files #69990

Merged

tlrx mentioned this pull request Mar 4, 2021

Use blob store cache for Lucene compound files #69992

Merged

tlrx mentioned this pull request Mar 5, 2021

Exclude partially cached .cfs file from SearchableSnapshotDirectoryStatsTests #70006

Merged

tlrx mentioned this pull request Mar 5, 2021

Exclude partially cached .cfs file from SearchableSnapshotDirectoryStatsTests #70018

Merged

tlrx mentioned this pull request Mar 5, 2021

Exclude partially cached .cfs file from SearchableSnapshotDirectoryStatsTests #70019

Merged

tlrx mentioned this pull request Mar 9, 2021

FrozenIndexInputTests testRandomReads test failure OOME #69740

Closed

tlrx mentioned this pull request Mar 23, 2021

[Draft] Improve usage of blob store cache during searchable snapshots shard recovery #69283

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use blob store cache for Lucene compound files #69861

Use blob store cache for Lucene compound files #69861

tlrx commented Mar 3, 2021 •

edited

Loading

elasticmachine commented Mar 3, 2021

tlrx Mar 3, 2021

tlrx Mar 3, 2021

tlrx Mar 3, 2021

tlrx commented Mar 4, 2021

martijnvg commented Mar 4, 2021

ywelsch left a comment

ywelsch Mar 4, 2021

tlrx Mar 4, 2021

ywelsch left a comment

tlrx commented Mar 4, 2021

Use blob store cache for Lucene compound files #69861

Use blob store cache for Lucene compound files #69861

Conversation

tlrx commented Mar 3, 2021 • edited Loading

elasticmachine commented Mar 3, 2021

tlrx Mar 3, 2021

Choose a reason for hiding this comment

tlrx Mar 3, 2021

Choose a reason for hiding this comment

tlrx Mar 3, 2021

Choose a reason for hiding this comment

tlrx commented Mar 4, 2021

martijnvg commented Mar 4, 2021

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Mar 4, 2021

Choose a reason for hiding this comment

tlrx Mar 4, 2021

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

tlrx commented Mar 4, 2021

tlrx commented Mar 3, 2021 •

edited

Loading