Allow searchable snapshot cache service to periodically fsync cache files #64696

tlrx · 2020-11-06T12:38:02Z

This pull request changes the searchable snapshot's CacheService so that it now periodically fsync cache files using the method introduced in #64201.

The synchronization is executed every 60 seconds by default (this interval can be changed using a new xpack.searchable.snapshot.cache.sync_interval setting). It is only executed on data nodes that have at least one searchable snapshot assigned. On a data node cache file fsyncs are serialized and executed on a per-shard basis where the order of shards is undefined.

elasticmachine · 2020-11-06T12:38:04Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

tlrx · 2020-11-06T12:39:30Z

...n/searchable-snapshots/src/test/java/org/elasticsearch/index/store/cache/CacheFileTests.java

@@ -335,7 +335,7 @@ public void onEviction(CacheFile evictedCacheFile) {
 public static void assertNumberOfFSyncs(final Path path, final Matcher<Long> matcher) {
 final FSyncTrackingFileSystemProvider provider = (FSyncTrackingFileSystemProvider) path.getFileSystem().provider();
 final AtomicLong fsyncCounter = provider.files.get(path);
- assertThat("File [" + path + "] was never fsynced", notNullValue());
+ assertThat("File [" + path + "] was never fsynced", fsyncCounter, notNullValue());


henningandersen

I did an initial read of this and have a few comments that I think need clarification before fully reviewing this.

henningandersen · 2020-11-10T06:20:43Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

 this.cache = CacheBuilder.<CacheKey, CacheFile>builder()
 .setMaximumWeight(cacheSize.getBytes())
 .weigher((key, entry) -> entry.getLength())
 // NORELEASE This does not immediately free space on disk, as cache file are only deleted when all index inputs
 // are done with reading/writing the cache file
 .removalListener(notification -> IOUtils.closeWhileHandlingException(() -> notification.getValue().startEviction()))
 .build();
+
+ if (DiscoveryNode.isDataNode(settings)) {


I wonder if it was better to not instantiate the CacheService at all on non-data nodes? Having it support non-data nodes without the cacheSyncTask seems counter intuitive (unless there is a good reason).

That would allow removing the asserts/ifs on cacheSyncTask != null.

++ I think not instantiating it would be nice.
Mild worry on this :) <- it might be a little cumbersome to get it right due to transport actions? (haven't checked this here in detail but I remember trying to do a similar thing elsewhere and it turned out to be tricky)

henningandersen · 2020-11-10T06:55:48Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ assert localNode.isDataNode();
+
+ final boolean shouldSynchronize = hasSearchableSnapshotShards(clusterState, localNode.getId());
+ cacheSyncTask.allowReschedule.set(shouldSynchronize);


I think the reason we need this is because cacheSyncTask.cancel does not promise to not reschedule until rescheduleIfNecessary is called again?

I wonder if that is a bug we should fix in AbstractAsyncTask? Can certainly be done in a follow-up (or separate PR).

henningandersen · 2020-11-10T06:58:06Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ final RoutingNode routingNode = clusterState.getRoutingNodes().node(nodeId);
+ if (routingNode != null) {
+ for (ShardRouting shardRouting : routingNode) {
+ if (shardRouting.active()) {


I think this could mean that we will not start fsync'ing during initialization/recovery of the first shard(s) on a starting node?

henningandersen · 2020-11-10T06:59:27Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+
+ final long startTimeNanos = threadPool.relativeTimeInNanos();
+ for (ShardRouting shardRouting : routingNode) {
+ if (shardRouting.active()) {


I think we also want to fsync initializing shards?

henningandersen · 2020-11-10T07:07:12Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ );
+
+ boolean syncDirectory = false;
+ for (Tuple<CacheKey, CacheFile> entry : cache.entries()) {


This seems like an n^2 algorithm. Since we iterate the cache entries, could we not just fsync the cache files individually as long as they are in the cache?

++, I would also think there's no need to be tricky here. We already have all the tricky logic for handling a cache file's life-cycle in CacheFile => I think I like the idea of just iterating over all CacheFile here combined with Henning's other point of having the CacheFile register itself for fsync in some form. That seems like it wouldn't add new complexities around synchronization and life-cycle beyond what we already have in CacheFile?

henningandersen · 2020-11-10T07:38:30Z

server/src/main/java/org/elasticsearch/common/cache/Cache.java

+ /**
+ * An LRU sequencing of the entries in the cache. This sequence is not protected from mutations
+ * to the cache (except for {@link Iterator#remove()}. The result of iteration under any other mutation is
+ * undefined.


I think this is true. In fact, it looks like the lru-chain would be unsafely published with no happens-before to the reader.

I think the iteration of the entries done in this PR is thus unsafe, at least it might skip some entries.

I think the same applies to iterating over keys(), which we do a few places.

Together with other comments in this PR, this makes me think that perhaps it was easier to let CacheFile's that need fsync register with the CacheService (or a registry in between)?

henningandersen · 2020-11-10T07:40:34Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ final DiscoveryNode localNode = clusterState.getNodes().getLocalNode();
+ assert localNode.isDataNode();
+
+ final boolean shouldSynchronize = hasSearchableSnapshotShards(clusterState, localNode.getId());


I wonder if we need to couple this to cluster state updates? Instead we could either trigger on the presence of any cache files or as proposed in another comment an explicit fsync-needed registration?

original-brownbear

I read over this quickly now after our meeting and I'm +1 on Henning's suggestions as commented on inline :)

original-brownbear · 2020-11-10T10:07:16Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

 this.cache = CacheBuilder.<CacheKey, CacheFile>builder()
 .setMaximumWeight(cacheSize.getBytes())
 .weigher((key, entry) -> entry.getLength())
 // NORELEASE This does not immediately free space on disk, as cache file are only deleted when all index inputs
 // are done with reading/writing the cache file
 .removalListener(notification -> IOUtils.closeWhileHandlingException(() -> notification.getValue().startEviction()))
 .build();
+
+ if (DiscoveryNode.isDataNode(settings)) {


++ I think not instantiating it would be nice.
Mild worry on this :) <- it might be a little cumbersome to get it right due to transport actions? (haven't checked this here in detail but I remember trying to do a similar thing elsewhere and it turned out to be tricky)

original-brownbear · 2020-11-10T10:57:52Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ );
+
+ boolean syncDirectory = false;
+ for (Tuple<CacheKey, CacheFile> entry : cache.entries()) {


++, I would also think there's no need to be tricky here. We already have all the tricky logic for handling a cache file's life-cycle in CacheFile => I think I like the idea of just iterating over all CacheFile here combined with Henning's other point of having the CacheFile register itself for fsync in some form. That seems like it wouldn't add new complexities around synchronization and life-cycle beyond what we already have in CacheFile?

tlrx · 2020-11-16T12:03:14Z

Thanks Henning and Armin for your valuable feedback.

@henningandersen I implemented your idea of having CacheFile registering themselves for fsync in fb10c6f. I was afraid that it would be complex so I tried to keep this as simple as possible and I do like the result. Not relying on cluster state or shard state also simplifies a lot of things.

I'd be happy if you can have another look. Let me know if it corresponds to what you had in mind. Thanks

henningandersen

Thanks @tlrx, this direction looks good to me. I did a quick initial read and thought I would relay my initial comments now. I will spend a bit more time on this later today or tomorrow.

henningandersen · 2020-11-16T12:55:09Z

server/src/main/java/org/elasticsearch/common/cache/Cache.java

+ *
+ * @return an LRU-ordered {@link Iterable} over the entries in the cache
+ */
+ public Iterable<Tuple<K,V>> entries() {


This looks unused now?

Just a drive by comment @tlrx this is still unused and can probably go away, just in case you missed it :)

I'll take a proper look at this PR in general now

Just a drive by comment @tlrx this is still unused and can probably go away, just in case you missed it :)

Thanks! I'll remove it.

I'll take a proper look at this PR in general now

I was about to ping you once Henning approved the direction :)

I removed the unused entries() method in 6b238d6

henningandersen · 2020-11-16T13:04:27Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ Setting.Property.Dynamic
+ );
+
+ private static final Supplier<Set<CacheFile>> SUPPLIER_OF_CACHE_FILES_TO_SYNC = ConcurrentCollections::newConcurrentSet;


Looks like this might as well be a method (or inlined)?

I removed this when moving to a Queue implementation.

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

henningandersen · 2020-11-16T13:18:09Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ * @param cacheFile the instance that needs to be fsync
+ */
+ void onCacheFileUpdate(CacheFile cacheFile) {
+ final boolean added = cacheFilesToSyncRef.get().add(cacheFile);


There is a race condition here, since we first get the set and then add to it. But the synchronizeCache method might read the set and check that it is empty before the add is done here, risking that we miss an fsync. Same is true for the non-empty case, in that it could start iterating over the set before the add is executed.

Maybe we can use a queue instead? Since CacheFile ensures it is only registered once, we might not need the set semantics except when removing a cache file (where we could just iterate the queue, seems ok, could even just leave it in the queue).

You're perfectly right, I'm sorry I did not catch it by myself as it is obvious. I pushed 684e01e to use a ConcurrentLinkedQueue which should give us the right semantic.

henningandersen · 2020-11-16T13:28:03Z

...plugin/searchable-snapshots/src/main/java/org/elasticsearch/index/store/cache/CacheFile.java

 success = true;
 return completedRanges;
 } finally {
 if (success == false) {
- needsFsync.set(true);
+ markAsNeedsFSync();
 }
 }
 }


If the compareAndSet above does not succeed, should we then perhaps fail when running tests (using an assert)?

I think we can but the existing tests assume that fsync can be executed at any time even when fsync is not needed. Those tests should be adapted as well, maybe in a follow up?

tlrx · 2020-11-16T15:18:30Z

Thanks @henningandersen. I've updated the code again.

original-brownbear

Looks really nice already thanks Tanguy! I mainly have a question on error handling and some smaller points :)

original-brownbear · 2020-11-16T17:04:41Z

...plugin/searchable-snapshots/src/main/java/org/elasticsearch/index/store/cache/CacheFile.java

 this.tracker = new SparseFileTracker(file.toString(), length);
 this.description = Objects.requireNonNull(description);
 this.file = Objects.requireNonNull(file);
+ this.needsFsyncListener = fsyncListener != null ? fsyncListener : cacheFile -> {};


Looks like we're only ever passing null here in tests, maybe cleaner to just pass the noop consumer in tests than having this conditional?

Done in af47aaa

original-brownbear · 2020-11-16T19:52:14Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ count += 1L;
+ }
+ } catch (Exception e) {
+ logger.warn(() -> new ParameterizedMessage("failed to fsync cache file [{}]", cacheFilePath.getFileName()), e);


I wonder if we should be this heroic? I guess I could see us running into an IOException here when exceeding the FD limit, but beyond that it seems there's very little valid reasons to keep going with a cache file after we fail to fsync it?
Should we have a notion of forcefully dropping such a file from the cache since we can't trust it any longer?

I agree, my attention was to not add the cache file to the Lucene index if the fsync failed.

Should we have a notion of forcefully dropping such a file from the cache since we can't trust it any longer?

I think we need such a mechanism because we should also discard a cache file in case we fail to write a range in it. I think I can try to tackle this in a follow up but I expect tests to be quite complex.

original-brownbear · 2020-11-16T19:54:18Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ IOUtils.fsync(cacheDir, true, false);
+ logger.trace("cache directory [{}] synchronized", cacheDir);
+ } catch (Exception e) {
+ logger.warn(() -> new ParameterizedMessage("failed to synchronize cache directory [{}]", cacheDir), e);


Same as the other comment, maybe even more pronounced here: what does it mean if we fail to fsync the directory? Doesn't it at least mean that we failed to fsync the file as well? (it's certainly not guaranteed to be safely persisted if it was just created on many Linux FS)

Doesn't it at least mean that we failed to fsync the file as well? (it's certainly not guaranteed to be safely persisted if it was just created on many Linux FS)

That's my understanding as well, meaning that the cache file should not be added to the Lucene index.

original-brownbear · 2020-11-16T19:57:14Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ */
+ void onCacheFileRemoval(CacheFile cacheFile) {
+ IOUtils.closeWhileHandlingException(cacheFile::startEviction);
+ cacheFilesToSync.remove(cacheFile);


Should we first remove the file from the queue and then evict to have less of a race here with concurrent fsync runs?

I saw it the other way, as startEviction() should prevent more ranges to be written and then the cache file to register itself back into the queue.

I removed the remove() call in eabe562

original-brownbear · 2020-11-16T20:07:11Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ */
+ void onCacheFileRemoval(CacheFile cacheFile) {
+ IOUtils.closeWhileHandlingException(cacheFile::startEviction);
+ cacheFilesToSync.remove(cacheFile);


Also, calling remove (O(n)) on a ConcurrentLinkedQueue feels like it may bring trouble if we're dealing with a large number of files queued up? Do we even need to do this when we could simply skip evicted files when polling the fsync queue?

Agree - Henning also raised this point. The CacheFile#fsync() method should return an empty set of ranges if the cache file is evicted and deleted from disk, we can rely on this to skip removed files.

I removed the remove() call in eabe562

henningandersen

Thanks for the extra iteration, this is looking good. I added a number of smaller comments.

henningandersen · 2020-11-17T09:27:33Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ final Set<Path> cacheDirs = new HashSet<>();
+ final long startTimeNanos = threadPool.relativeTimeInNanos();
+ final int maxCacheFilesToSync = this.maxCacheFilesToSyncAtOnce;
+ for (long i = 0L; i < maxCacheFilesToSync; i++) {


It would be nice to ensure we never loop further than the set of cache files already in the queue at the beginning of this method. Can we cap the number of iterations by the size of the queue before we start any fsync'ing? It is O(n) on the linked-queue implementation, but I think that is fine.

If we just continue looping, we risk an oscillating effect of writing another block of data and then fsync'ing it multiple times, resulting in much more fsync'ing than desired.

This is a good suggestion, I pushed 437c8ea.

Are you sure this is ok? The size() call isn't just O(n) it also has no accuracy guarantees. Should we maybe use org.elasticsearch.common.util.concurrent.SizeBlockingQueue here to make this a little safer?

I'd prefer not using a BlockingQueue at all. An alternative could be to maintain a cacheFilesToSync atomic counter that is incremented when the cache file register itself for fsync (in onCacheFileUpdate) and decremented everytime a fsync is executed. The current size would be captured before we start any fsync'ing.

Ah right, this queue isn't blocking :) -> counter sounds good to me

I pushed 395845d

henningandersen · 2020-11-17T09:31:43Z

...plugin/searchable-snapshots/src/main/java/org/elasticsearch/index/store/cache/CacheFile.java

 success = true;
 return completedRanges;
 } finally {
 if (success == false) {
- needsFsync.set(true);
+ markAsNeedsFSync();
 }
 }
 }


henningandersen · 2020-11-17T10:17:32Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ count += 1L;
+ }
+ } catch (Exception e) {
+ logger.warn(() -> new ParameterizedMessage("failed to fsync cache file [{}]", cacheFilePath.getFileName()), e);


I think we only expect IOException here. Perhaps we can assert that to ensure tests will fail?
Similar assert a few lines up.

Makes sense, I added c20bf75

henningandersen · 2020-11-17T10:37:14Z

...plugin/searchable-snapshots/src/main/java/org/elasticsearch/index/store/cache/CacheFile.java

@@ -117,10 +124,11 @@ protected void closeInternal() {
 @Nullable
 private volatile FileChannelReference channelRef;

- public CacheFile(String description, long length, Path file) {
+ public CacheFile(String description, long length, Path file, Consumer<CacheFile> fsyncListener) {


nit: I find it nicer to just pass in a Runnable here, to make it clear that a CacheFile can only request an fsync of itself.

I pushed 74f728f to use a Runnable

henningandersen · 2020-11-17T10:40:09Z

...plugin/searchable-snapshots/src/main/java/org/elasticsearch/index/store/cache/CacheFile.java

 success = true;
 return completedRanges;
 } finally {
 if (success == false) {
- needsFsync.set(true);
+ markAsNeedsFSync();


This will add it back on the queue, so we continually try to fsync a bad file. I saw the comments Armin made in other places about this, we can tackle this in the same follow-up.

Let's do that 👍

henningandersen · 2020-11-17T10:42:16Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

@@ -52,40 +68,78 @@
 Setting.Property.NodeScope
 );

+ public static final TimeValue MIN_SNAPSHOT_CACHE_SYNC_INTERVAL = TimeValue.timeValueSeconds(10L);


I wonder if we should lower this to 1 second.

I can imagine us using this for a poor-mans rate limiter - set the interval to 1 second and number of files to sync to 10 on a spinning disk setup in case fsync'ing causes issues.

Makes sense, I pushed 365812d

henningandersen · 2020-11-17T10:50:53Z

...plugin/searchable-snapshots/src/test/java/org/elasticsearch/index/store/cache/TestUtils.java

+ private final FileSystem delegateInstance;
+ private final Path rootDir;


These fields are unused. Seems like you intended the rootDir to have significance?

This should have been mutualized with some other tests, it is now in 5091a5d. Thanks for spotting this.

henningandersen · 2020-11-17T11:08:14Z

...shots/src/test/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheServiceTests.java

+ }
+
+ logger.trace("--> evicting random cache files");
+ for (CacheKey evictedCacheKey : randomSubsetOf(Sets.union(previous.keySet(), updates.keySet()))) {


Would be good to verify that once we evicted, we do not fsync these CacheFiles anymore.

Right - I added f69dde9

henningandersen · 2020-11-17T11:14:38Z

...st/java/org/elasticsearch/xpack/searchablesnapshots/AbstractSearchableSnapshotsTestCase.java

+ if (randomBoolean()) {
+ cacheSettings.put(
+ CacheService.SNAPSHOT_CACHE_SYNC_INTERVAL_SETTING.getKey(),
+ TimeValue.timeValueSeconds(randomLongBetween(MIN_SNAPSHOT_CACHE_SYNC_INTERVAL.getSeconds(), Long.MAX_VALUE))


I think we should bias towards smaller values, perhaps using scaledRandomInt?

I pushed 8fe920b to randomize between 1 and 120 seconds.

henningandersen

I went through this in more detail, looks good. I have a number of minor comments only.

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

...n/searchable-snapshots/src/test/java/org/elasticsearch/index/store/cache/CacheFileTests.java

...shots/src/test/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheServiceTests.java

...plugin/searchable-snapshots/src/test/java/org/elasticsearch/index/store/cache/TestUtils.java

henningandersen

LGTM, thanks for the extra iterations @tlrx .

henningandersen · 2020-11-23T09:39:26Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

 Setting.Property.NodeScope,
 Setting.Property.Dynamic
 );

+ public static final Setting<TimeValue> SNAPSHOT_CACHE_SYNC_SHUTDOWN_TIMEOUT = Setting.timeSetting(
+ SETTINGS_PREFIX + "sync.shutdown_timeout",
+ TimeValue.timeValueSeconds(60L), // default


Perhaps just 10 seconds default, since we only need to wait for one fsync and if it takes more than 10s to do one, we really want to continue shutting down the node anyway? The other doStop timeouts that I found (did not search thoroughly though) are in the 10-30s range.

henningandersen · 2020-11-23T09:44:51Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+ cache.invalidateAll();
+ } catch (InterruptedException e) {
+ Thread.currentThread().interrupt();
+ logger.warn("interrupted while waiting for cache sync lock", e);


I think we need to also do cacheSyncTask.close() and cache.invalidateAll() in this case? Possibly better to surround the tryLock with a separate try catch for InterruptedException.

Oh right, I'll surround the tryLock

henningandersen · 2020-11-23T09:46:27Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

- for (long i = 0L; i < maxCacheFilesToSync; i++) {
+ protected void synchronizeCache() {
+ cacheSyncLock.lock();
+ try {
 if (lifecycleState() != Lifecycle.State.STARTED) {


I would prefer to keep this inside the loop to ensure we break out as soon as possible when shutting down.

tlrx · 2020-11-23T11:02:38Z

Thanks Henning and Armin!

…iles (elastic#64696) This committ changes the searchable snapshot's CacheService so that it now periodically fsync cache files using the method introduced in elastic#64201. The synchronization is executed every 10 seconds by default (this interval can be changed using a new xpack.searchable.snapshot.cache.sync.interval setting).

The searchable snapshots cache implemented in 7.10 is not persisted across node restarts, forcing data nodes to download files from the snapshot repository again once the node is restarted. This commit introduces a new Lucene index that is used to store information about cache files. The information about cache files are periodically updated and committed in this index as part of the cache synchronization task added in #64696. When the data node starts the Lucene index is used to load in memory the cache files information; these information are then used to repopulate the searchable snapshots cache with the cache files that exist on disk. Since data nodes can have one or more data paths, this change introduces a Lucene index per data path. Information about cache files are updated in the Lucene index located on the same data path of the cache files.

…files #64696 (#66216) This committ changes the searchable snapshot's CacheService so that it now periodically fsync cache files using the method introduced in #64201. The synchronization is executed every 10 seconds by default (this interval can be changed using a new xpack.searchable.snapshot.cache.sync.interval setting). Backport of #64696 for 7.11

The searchable snapshots cache implemented in 7.10 is not persisted across node restarts, forcing data nodes to download files from the snapshot repository again once the node is restarted. This commit introduces a new Lucene index that is used to store information about cache files. The information about cache files are periodically updated and committed in this index as part of the cache synchronization task added in elastic#64696. When the data node starts the Lucene index is used to load in memory the cache files information; these information are then used to repopulate the searchable snapshots cache with the cache files that exist on disk. Since data nodes can have one or more data paths, this change introduces a Lucene index per data path. Information about cache files are updated in the Lucene index located on the same data path of the cache files.

The searchable snapshots cache implemented in 7.10 is not persisted across node restarts, forcing data nodes to download files from the snapshot repository again once the node is restarted. This commit introduces a new Lucene index that is used to store information about cache files. The information about cache files are periodically updated and committed in this index as part of the cache synchronization task added in #64696. When the data node starts the Lucene index is used to load in memory the cache files information; these information are then used to repopulate the searchable snapshots cache with the cache files that exist on disk. Since data nodes can have one or more data paths, this change introduces a Lucene index per data path. Information about cache files are updated in the Lucene index located on the same data path of the cache files. Backport of #65725 for 7.11

Periodically fsync searchable snapshots cache files

012eee2

tlrx added >enhancement :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.11.0 labels Nov 6, 2020

elasticmachine added the Team:Distributed Meta label for distributed team label Nov 6, 2020

tlrx commented Nov 6, 2020

View reviewed changes

tlrx requested review from henningandersen and original-brownbear November 6, 2020 13:30

tlrx added 2 commits November 9, 2020 17:58

Merge branch 'master' into periodic-fsync

499f12b

Fix spotless

31299c1

henningandersen reviewed Nov 10, 2020

View reviewed changes

original-brownbear reviewed Nov 10, 2020

View reviewed changes

tlrx added 4 commits November 12, 2020 12:39

Merge branch 'master' into periodic-fsync

bf0e660

Merge branch 'master' into periodic-fsync

9085f81

use registering mechanism to sync cache files

fb10c6f

Merge branch 'master' into periodic-fsync

f5d9559

tlrx requested a review from henningandersen November 16, 2020 12:03

henningandersen reviewed Nov 16, 2020

View reviewed changes

henningandersen self-requested a review November 16, 2020 13:29

Use a queue

684e01e

original-brownbear reviewed Nov 16, 2020

View reviewed changes

tlrx added 3 commits November 17, 2020 09:13

revert changes in Cache

6b238d6

do not remove from queue on eviction

eabe562

noop fsyncListener

af47aaa

henningandersen reviewed Nov 17, 2020

View reviewed changes

limit iterations

437c8ea

henningandersen reviewed Nov 18, 2020

View reviewed changes

tlrx added 8 commits November 20, 2020 10:44

private

ea34570

provider.tearDown()

54d051d

randomPopulateAndReads

e8ad9be

set interval before start

6066924

deleteIfExists

e9721c4

lock & waitfor termination

bd1ce13

Merge branch 'master' into periodic-fsync

46adb3c

missing close

04a4d77

tlrx requested a review from henningandersen November 20, 2020 13:38

Merge branch 'master' into periodic-fsync

db24043

henningandersen approved these changes Nov 23, 2020

View reviewed changes

nits

10216ea

tlrx added the backport pending label Nov 23, 2020

tlrx merged commit 8d28c35 into elastic:master Nov 23, 2020

tlrx deleted the periodic-fsync branch November 23, 2020 11:02

tlrx mentioned this pull request Dec 2, 2020

Make searchable snapshots cache persistent #65725

Merged

tlrx mentioned this pull request Dec 12, 2020

Allow searchable snapshot cache service to periodically fsync cache files #64696 #66216

Merged

tlrx removed the backport pending label Dec 14, 2020

tlrx mentioned this pull request Dec 14, 2020

Make searchable snapshots cache persistent #66275

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

		private final FileSystem delegateInstance;
		private final Path rootDir;

Allow searchable snapshot cache service to periodically fsync cache files #64696

Allow searchable snapshot cache service to periodically fsync cache files #64696

Conversation

tlrx commented Nov 6, 2020

elasticmachine commented Nov 6, 2020

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Nov 16, 2020

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Nov 16, 2020

original-brownbear left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment