-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust the length of blob cache docs for Lucene metadata files #69431
Adjust the length of blob cache docs for Lucene metadata files #69431
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
builder.put(SnapshotsService.SNAPSHOT_CACHE_REGION_SIZE_SETTING.getKey(), blobCacheMaxLength); | ||
builder.put(SnapshotsService.SHARED_CACHE_RANGE_SIZE_SETTING.getKey(), blobCacheMaxLength); | ||
builder.put(FrozenCacheService.FROZEN_CACHE_RECOVERY_RANGE_SIZE_SETTING.getKey(), blobCacheMaxLength); | ||
cacheSettings = builder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to randomize these settings.
builder.startObject().field("text", randomRealisticUnicodeOfCodepointLengthBetween(5, 50)).field("num", i).endObject(); | ||
indexRequestBuilders.add(client().prepareIndex(indexName).setSource(builder)); | ||
} | ||
indexRandom(true, true, true, indexRequestBuilders); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now uses dummyDocs
in order to generate deletions
if (METADATA_FILES_EXTENSIONS.contains(fileExtension)) { | ||
final long maxAllowedLengthInBytes = maxMetadataLength.getBytes(); | ||
if (fileLength > maxAllowedLengthInBytes) { | ||
logger.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe too verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a Cache
to log this once per filetype (and expire in an hour)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this PR. I've done a first pass and left some comments.
if (indexInputStats.getTotalSize() <= BlobStoreCacheService.DEFAULT_CACHED_BLOB_SIZE * 2 | ||
|| mayReadMoreThanHeader == false) { | ||
final boolean mayReadMoreThanHeader = indexInputStats.getFileExt().equals("cfs") | ||
|| indexInputStats.getFileExt().equals("cfe"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfs files I understand, but why can't we fully cache cfe files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a leftover from debugging sessions, I removed them in 1d9cda0
@@ -41,7 +45,7 @@ | |||
|
|||
private static final Logger logger = LogManager.getLogger(BlobStoreCacheService.class); | |||
|
|||
public static final int DEFAULT_CACHED_BLOB_SIZE = ByteSizeUnit.KB.toIntBytes(4); | |||
public static final int DEFAULT_CACHED_BLOB_SIZE = ByteSizeUnit.KB.toIntBytes(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this change affect documents in the blob cache that have been created with a previous ES version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed response, I had to deal with other duties.
Looking at the current code in 7.x, reducing the size of cached blob might throw an exception in Cached/FrozenIndexInput at this line:
final BytesRefIterator cachedBytesIterator = cachedBlob.bytes().slice(toIntBytes(position), length).iterator();
in the case position
is larger than one or two buffered reads.
I suggest
- to keep the 4KB/8KB limit for non metadata files as long as there is at least a node in version < 7.13 in the cluster
- to write some BWC test that ensure indices are correctly assigned during rolling upgrades
if (METADATA_FILES_EXTENSIONS.contains(fileExtension)) { | ||
final long maxAllowedLengthInBytes = maxMetadataLength.getBytes(); | ||
if (fileLength > maxAllowedLengthInBytes) { | ||
logger.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a Cache
to log this once per filetype (and expire in an hour)
...rchable-snapshots/src/main/java/org/elasticsearch/blobstore/cache/BlobStoreCacheService.java
Outdated
Show resolved
Hide resolved
...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java
Outdated
Show resolved
Hide resolved
...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java
Outdated
Show resolved
Hide resolved
...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java
Outdated
Show resolved
Hide resolved
...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java
Outdated
Show resolved
Hide resolved
...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java
Outdated
Show resolved
Hide resolved
|
||
// Frozen (shared cache) cache should be large enough to not cause direct reads | ||
builder.put(SnapshotsService.SNAPSHOT_CACHE_SIZE_SETTING.getKey(), ByteSizeValue.ofMb(128)); | ||
// Align ranges to match the blob cache max length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have this alignment?
Thanks @ywelsch for your review. I updated the code according to your comments. Concerning #69431 (comment) and the behavior with previous versions, I followed my suggestion in #69431 (comment) and I added a rolling upgrade test named This integration test is composed of 4 tests:
This is ready for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks Tanguy!
Thanks Yannick! I'm going to backport this down to 7.12 and adjusts the BWC version OLD_CACHED_BLOB_SIZE_VERSION in follow up PRs. |
…ic#69431) Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from elastic#69283 and does not address the caching of CFS files. Backport of elastic#69431
…ic#69431) Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from elastic#69283 and does not address the caching of CFS files. Backport of elastic#69431
Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from #69283 and does not address the caching of CFS files. Backport of #69431
Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files. The files that are cached up to 64KB are the following extensions: "cfe", // compound file's entry table "dvm", // doc values metadata file "fdm", // stored fields metadata file "fnm", // field names metadata file "kdm", // Lucene 8.6 point format metadata file "nvm", // norms metadata file "tmd", // Lucene 8.6 terms metadata file "tvm", // terms vectors metadata file "vem" // Lucene 9.0 indexed vectors metadata The 64KB limit can be configured on a per index basis through a new index setting. This change is extracted from #69283 and does not address the caching of CFS files. Backport of #69431
Today searchable snapshots
IndexInput
implementations use the blob store cache to cache the first4096
bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files.The files that are cached up to 64KB are the files with the following extensions:
The 64KB limit can be configured on a per index basis through a new index setting.
This change is extracted from #69283 and does not address the caching of CFS files.