Skip to content

Snapshots of Unchanged Shards Could be Optimized  #45736

@original-brownbear

Description

@original-brownbear

Currently, a snapshot of a shard that has not changed at all relative to an existing snapshot of the shard (i.e. does not require uploading any files for that shard) still triggers the following operations:

  • Write snap-${uuid}.dat blob to the unchanged shard's folder in the repository
  • Write new index-N blob to the unchanged shard's folder in the repository

In practice the effect of this is significant for use cases like rolling indices per day/hour/etc. A cluster that contains a small and bounded number of indices/shards that are actively written to and a large and growing number of shards that are constant in time will over time see ever more expensive and slower snapshots even though the amount of data added by each snapshot is not increasing.
This could be avoided by referencing the content of snap-${uuid}.dat in each shard differently. Instead of creating a blob per snapshot+shard tuple, a certain state of a shard could be described by what is currently a snap-${uuid}.dat and then itself be referenced from the root level index-N in the repository.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions