Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the length of blob cache docs for Lucene metadata files #69431

Merged
merged 20 commits into from
Mar 1, 2021

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Feb 23, 2021

Today searchable snapshots IndexInput implementations use the blob store cache to cache the first 4096 bytes of every Lucene files. After some experiments we think that we could adjust the length of the cached data depending of the Lucene file that is read, caching up to 64KB for Lucene metadata files (ie files that are fully read when a Directory is opened) and only 1KB for other files.

The files that are cached up to 64KB are the files with the following extensions:

    "cfe", // compound file's entry table
    "dvm", // doc values metadata file
    "fdm", // stored fields metadata file
    "fnm", // field names metadata file
    "kdm", // Lucene 8.6 point format metadata file
    "nvm", // norms metadata file
    "tmd", // Lucene 8.6 terms metadata file
    "tvm", // terms vectors metadata file
    "vem"  // Lucene 9.0 indexed vectors metadata

The 64KB limit can be configured on a per index basis through a new index setting.

This change is extracted from #69283 and does not address the caching of CFS files.

@tlrx tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.13.0 labels Feb 23, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

builder.put(SnapshotsService.SNAPSHOT_CACHE_REGION_SIZE_SETTING.getKey(), blobCacheMaxLength);
builder.put(SnapshotsService.SHARED_CACHE_RANGE_SIZE_SETTING.getKey(), blobCacheMaxLength);
builder.put(FrozenCacheService.FROZEN_CACHE_RECOVERY_RANGE_SIZE_SETTING.getKey(), blobCacheMaxLength);
cacheSettings = builder.build();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to randomize these settings.

builder.startObject().field("text", randomRealisticUnicodeOfCodepointLengthBetween(5, 50)).field("num", i).endObject();
indexRequestBuilders.add(client().prepareIndex(indexName).setSource(builder));
}
indexRandom(true, true, true, indexRequestBuilders);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now uses dummyDocs in order to generate deletions

if (METADATA_FILES_EXTENSIONS.contains(fileExtension)) {
final long maxAllowedLengthInBytes = maxMetadataLength.getBytes();
if (fileLength > maxAllowedLengthInBytes) {
logger.warn(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe too verbose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a Cache to log this once per filetype (and expire in an hour)

@tlrx tlrx requested a review from ywelsch February 23, 2021 12:39
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR. I've done a first pass and left some comments.

if (indexInputStats.getTotalSize() <= BlobStoreCacheService.DEFAULT_CACHED_BLOB_SIZE * 2
|| mayReadMoreThanHeader == false) {
final boolean mayReadMoreThanHeader = indexInputStats.getFileExt().equals("cfs")
|| indexInputStats.getFileExt().equals("cfe");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfs files I understand, but why can't we fully cache cfe files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a leftover from debugging sessions, I removed them in 1d9cda0

@@ -41,7 +45,7 @@

private static final Logger logger = LogManager.getLogger(BlobStoreCacheService.class);

public static final int DEFAULT_CACHED_BLOB_SIZE = ByteSizeUnit.KB.toIntBytes(4);
public static final int DEFAULT_CACHED_BLOB_SIZE = ByteSizeUnit.KB.toIntBytes(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this change affect documents in the blob cache that have been created with a previous ES version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response, I had to deal with other duties.

Looking at the current code in 7.x, reducing the size of cached blob might throw an exception in Cached/FrozenIndexInput at this line:

final BytesRefIterator cachedBytesIterator = cachedBlob.bytes().slice(toIntBytes(position), length).iterator();

in the case position is larger than one or two buffered reads.

I suggest

  • to keep the 4KB/8KB limit for non metadata files as long as there is at least a node in version < 7.13 in the cluster
  • to write some BWC test that ensure indices are correctly assigned during rolling upgrades

if (METADATA_FILES_EXTENSIONS.contains(fileExtension)) {
final long maxAllowedLengthInBytes = maxMetadataLength.getBytes();
if (fileLength > maxAllowedLengthInBytes) {
logger.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a Cache to log this once per filetype (and expire in an hour)


// Frozen (shared cache) cache should be large enough to not cause direct reads
builder.put(SnapshotsService.SNAPSHOT_CACHE_SIZE_SETTING.getKey(), ByteSizeValue.ofMb(128));
// Align ranges to match the blob cache max length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have this alignment?

@tlrx
Copy link
Member Author

tlrx commented Mar 1, 2021

Thanks @ywelsch for your review. I updated the code according to your comments.

Concerning #69431 (comment) and the behavior with previous versions, I followed my suggestion in #69431 (comment) and I added a rolling upgrade test named SearchableSnapshotsRollingUpgradeIT.

This integration test is composed of 4 tests:

  • two are tests that mount a snapshot as an index in the previous version and verifies that the index is recovered and searchable on mixed and upgraded version.
  • the two other tests are a bit tricky and try to test the behavior of the blob cache when cached blob documents are created from an old version or a new version. Those tests only runs in mixed version cluster. I validated that it throws the exception I suspected in Adjust the length of blob cache docs for Lucene metadata files #69431 (comment).

This is ready for another review.

@tlrx tlrx requested a review from ywelsch March 1, 2021 08:16
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Tanguy!

@tlrx tlrx merged commit 5c2b15a into elastic:master Mar 1, 2021
@tlrx tlrx deleted the adjust-cached-blob-size-for-metadata-files branch March 1, 2021 10:28
@tlrx
Copy link
Member Author

tlrx commented Mar 1, 2021

Thanks Yannick!

I'm going to backport this down to 7.12 and adjusts the BWC version OLD_CACHED_BLOB_SIZE_VERSION in follow up PRs.

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Mar 1, 2021
…ic#69431)

Today searchable snapshots IndexInput implementations use the
blob store cache to cache the first 4096 bytes of every Lucene files.
After some experiments we think that we could adjust the length of
the cached data depending of the Lucene file that is read, caching
up to 64KB for Lucene metadata files (ie files that are fully read
when a Directory is opened) and only 1KB for other files.

The files that are cached up to 64KB are the following extensions:

        "cfe", // compound file's entry table
        "dvm", // doc values metadata file
        "fdm", // stored fields metadata file
        "fnm", // field names metadata file
        "kdm", // Lucene 8.6 point format metadata file
        "nvm", // norms metadata file
        "tmd", // Lucene 8.6 terms metadata file
        "tvm", // terms vectors metadata file
        "vem"  // Lucene 9.0 indexed vectors metadata

The 64KB limit can be configured on a per index basis through a new
index setting. This change is extracted from elastic#69283 and does not
address the caching of CFS files.

Backport of elastic#69431
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Mar 2, 2021
…ic#69431)

Today searchable snapshots IndexInput implementations use the
blob store cache to cache the first 4096 bytes of every Lucene files.
After some experiments we think that we could adjust the length of
the cached data depending of the Lucene file that is read, caching
up to 64KB for Lucene metadata files (ie files that are fully read
when a Directory is opened) and only 1KB for other files.

The files that are cached up to 64KB are the following extensions:

        "cfe", // compound file's entry table
        "dvm", // doc values metadata file
        "fdm", // stored fields metadata file
        "fnm", // field names metadata file
        "kdm", // Lucene 8.6 point format metadata file
        "nvm", // norms metadata file
        "tmd", // Lucene 8.6 terms metadata file
        "tvm", // terms vectors metadata file
        "vem"  // Lucene 9.0 indexed vectors metadata

The 64KB limit can be configured on a per index basis through a new
index setting. This change is extracted from elastic#69283 and does not
address the caching of CFS files.

Backport of elastic#69431
tlrx added a commit that referenced this pull request Mar 3, 2021
Today searchable snapshots IndexInput implementations use the
blob store cache to cache the first 4096 bytes of every Lucene files.
After some experiments we think that we could adjust the length of
the cached data depending of the Lucene file that is read, caching
up to 64KB for Lucene metadata files (ie files that are fully read
when a Directory is opened) and only 1KB for other files.

The files that are cached up to 64KB are the following extensions:

    "cfe", // compound file's entry table
    "dvm", // doc values metadata file
    "fdm", // stored fields metadata file
    "fnm", // field names metadata file
    "kdm", // Lucene 8.6 point format metadata file
    "nvm", // norms metadata file
    "tmd", // Lucene 8.6 terms metadata file
    "tvm", // terms vectors metadata file
    "vem"  // Lucene 9.0 indexed vectors metadata

The 64KB limit can be configured on a per index basis through a new
index setting. This change is extracted from #69283 and does not
address the caching of CFS files.

Backport of #69431
tlrx added a commit that referenced this pull request Mar 3, 2021
Today searchable snapshots IndexInput implementations use the
blob store cache to cache the first 4096 bytes of every Lucene files.
After some experiments we think that we could adjust the length of
the cached data depending of the Lucene file that is read, caching
up to 64KB for Lucene metadata files (ie files that are fully read
when a Directory is opened) and only 1KB for other files.

The files that are cached up to 64KB are the following extensions:

    "cfe", // compound file's entry table
    "dvm", // doc values metadata file
    "fdm", // stored fields metadata file
    "fnm", // field names metadata file
    "kdm", // Lucene 8.6 point format metadata file
    "nvm", // norms metadata file
    "tmd", // Lucene 8.6 terms metadata file
    "tvm", // terms vectors metadata file
    "vem"  // Lucene 9.0 indexed vectors metadata

The 64KB limit can be configured on a per index basis through a new
index setting. This change is extracted from #69283 and does not
address the caching of CFS files.

Backport of #69431
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.12.1 v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants