-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Currently, a snapshot of a shard that has not changed at all relative to an existing snapshot of the shard (i.e. does not require uploading any files for that shard) still triggers the following operations:
- Write
snap-${uuid}.datblob to the unchanged shard's folder in the repository - Write new
index-Nblob to the unchanged shard's folder in the repository
In practice the effect of this is significant for use cases like rolling indices per day/hour/etc. A cluster that contains a small and bounded number of indices/shards that are actively written to and a large and growing number of shards that are constant in time will over time see ever more expensive and slower snapshots even though the amount of data added by each snapshot is not increasing.
This could be avoided by referencing the content of snap-${uuid}.dat in each shard differently. Instead of creating a blob per snapshot+shard tuple, a certain state of a shard could be described by what is currently a snap-${uuid}.dat and then itself be referenced from the root level index-N in the repository.