-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Improve usage of blob store cache during searchable snapshots shard recovery #69283
Conversation
|
||
private CompoundReaderUtils() {} | ||
|
||
public static Map<String, Map<String, Tuple<Long, Long>>> extractCompoundFiles(Directory directory) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really sorry but I did not find any better way to extract the list of files that composed the CFS. Reading only the .cfe is possible but that won't give the inner offsets (only the lengths) and I think it is better to check the right boundaries.
@@ -205,33 +250,36 @@ public void testBlobStoreCache() throws Exception { | |||
|
|||
assertAcked(client().admin().indices().prepareDelete(restoredIndex)); | |||
|
|||
logger.info("--> mount snapshot [{}] as an index for the second time", snapshot); | |||
final String restoredAgainIndex = mountSnapshot( | |||
cacheEnabled = randomBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second time the index is mounted can now be fully randomized between full cache/partial cache/no cache.
|| mayReadMoreThanHeader == false) { | ||
assertThat(Strings.toString(indexInputStats), indexInputStats.getBlobStoreBytesRequested().getCount(), equalTo(0L)); | ||
} | ||
assertThat(Strings.toString(indexInputStats), indexInputStats.getBlobStoreBytesRequested().getCount(), equalTo(0L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now blindly assume in this test that no bytes where requested when mounting the second time.
Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from #69283 and does not address the caching of CFS files.
…ic#69431) Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from elastic#69283 and does not address the caching of CFS files. Backport of elastic#69431
…ic#69431) Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from elastic#69283 and does not address the caching of CFS files. Backport of elastic#69431
Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from #69283 and does not address the caching of CFS files. Backport of #69431
Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from #69283 and does not address the caching of CFS files. Backport of #69431
The blob store cache was introduced in #60522 to speed up searchable snapshots shard recovery by caching (in a system index) the first
4096
bytes, sometimes8192
, of every Lucene files that compose a shard.Recent experiments using large snapshots suggest that we could maybe adjust the current caching strategy by caching less data (ie
1024
bytes) by default for most of the files and cache more data (up to64KB
) for Lucene metadata files.This draft pull request addresses this point by introducing a
BlobStoreCacheService#computeHeaderByteRange()
that computes the range of bytes to put in blob store cache depending of the Lucene file type.We also noticed that compound files could represent a non negligeable amount of the total size of a shard (~30% in our tests) and that it may be worth to avoid random seeks and reads by also caching the files that compose
.cfs
files.This pull request addresses this point by caching headers and footers of
.cfs
inner files in the blob store cache. The size of the data to cache for the header is computed usingcomputeHeaderByteRange()
. The footer is16
bytes long.Finally, we found that concurrent prewarming and directory opening could prevent some file parts to be effectively cached in the blob store cache the first time an index is mounted, forcing some bytes to be redownloaded again the next times that index will be mounted.
This pull request addresses this point by detecting when using the blob store cache index should be preferred rather than using the disk based cache. Blob store cache is always preferred when the recovery is not finalized yet, and completely bypassed when the recovery is done.
I'm opening this PR as a draft to show the complexity introduced by this change. It's possible that we decide to move forward with only a subset of the changes.