-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CacheFile#fsync() method to ensure cached data are written on disk #64201
Conversation
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
try { | ||
// check again if the file is evicted before doing expensive I/O | ||
ensureOpen(); | ||
IOUtils.fsync(file, false); // TODO don't forget to fsync parent directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is intended to be called periodically by a task that will fsync all cache files of a given shard before fsyncing the parent directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two questions, looks good overall
try { | ||
// check again if the file is evicted before doing expensive I/O | ||
ensureOpen(); | ||
IOUtils.fsync(file, false); // TODO don't forget to fsync parent directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, can't we just call force
on the file channel that we already have open if we have one and avoid fsync if we don't have one by fsyncing before closing the channel (would have to enhance acquireFileChannelReference
to return null
in case there's no eviction listeners on the current instance I guess)? No need to get a new file descriptor here maybe is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also thought about borrowing the current file channel if we have one but I found that it added more complexity (mostly in incref/decref the file channel another time) to save a file descriptor that would be quickly released.
I can maybe give it another try by implementing something that use the file channel if there's one or use IOUtils.fsync() otherwise. Deferring the fsync at closing time does not seem the right thing to do as we don't want to fsync everytime a chunk is written but instead control when cache files are fsynced.
@@ -305,6 +312,7 @@ protected void doRun() throws Exception { | |||
reference.decRef(); | |||
} | |||
gap.onCompletion(); | |||
fsynced.set(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to set this false
as early as possible (i.e. directly after the write to the file) to reduce contention statistically speaking? (since fsync
and write
will always be blocking each other anyway even if we do use multiple FileChannel
instances I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that fsynced
not only indicate that something was written to the file but that ranges are also updated - although it is not a requirement. I'm not sure if it makes a big difference but I can move this earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's a good point :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way (regarding the file channels) LGTM, that comment does only matter if the fsync interval we will use is small I guess :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments, looking good otherwise.
final List<Tuple<Long, Long>> completedRanges = tracker.getCompletedRanges(); | ||
assert completedRanges != null; | ||
|
||
if (fsynced.compareAndSet(false, true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an assumption of only ever going here in one thread, since otherwise we risk a concurrent thread seeing the completedRanges
. I think that is fair, but perhaps for safety we should either lock the entire method or assert something to capture this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK, but your comment and your suggestion about returning an empty set of ranges if the cache file got evicted made me change a bit the semantic of this new fsync method.
I pushed d55278e which makes the method return a non-empty set of completed ranges only if the cache file was actually fsynced on disk; otherwise it returns an empty collection. This should address the case of concurrent threads and also simplify the logic to know if new ranges of data were effectively synchronized on disk or not.
try { | ||
// check again if the file is evicted before doing expensive I/O | ||
ensureOpen(); | ||
IOUtils.fsync(file, false); // TODO don't forget to fsync parent directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also fsync's the metadata of the file. I think that is unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - it is an extra I/O that we don't need. I pushed 6eea900
public void testFSync() throws Exception { | ||
try (FSyncTrackingFileSystemProvider fileSystem = setupFSyncCountingFileSystem()) { | ||
final CacheFile cacheFile = new CacheFile("test", randomLongBetween(100, 1000), fileSystem.resolve("test")); | ||
assertFalse(cacheFile.isFSynced()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think counting the file as not fsync'ed when just created could lead to fsync'ing the empty file - which I think is unnecessary? We care mostly about the data, not the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - I pushed 2f8b0e7
if (randomBoolean()) { | ||
final long position = i * 10L; // simplify the test by completing small non-contiguous ranges | ||
final Tuple<Long, Long> range = Tuple.tuple(position, position + 1L); | ||
cacheFile.populateAndRead( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to also verify multiple populateAndRead
calls within one fsync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I randomized this in the test now.
* @throws AlreadyClosedException if the cache file is evicted | ||
* @throws java.nio.file.NoSuchFileException if the cache file does not exist | ||
*/ | ||
public List<Tuple<Long, Long>> fsync() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return a SortedSet
instead to signal the expected order verified by the test, as well as uniqueness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion, thanks. I pushed 7b3590e
} | ||
|
||
cacheFile.startEviction(); | ||
expectThrows(AlreadyClosedException.class, cacheFile::fsync); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can definitely be handled in a follow-up instead when you start using it. But it might be more appropriate to just return the empty set of ranges in this case, since we essentially having nothing cached for it, rather than have to exception handle in the fsync'er thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. In case of an evicted file the method will now return an empty collection. See #64201 (comment)
if (fsynced.compareAndSet(false, true)) { | ||
boolean success = false; | ||
try { | ||
// check again if the file is evicted before doing expensive I/O |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is necessary, since there is no waiting above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it.
@henningandersen Thanks for your review. I updated the code and changed a bit the semantic of the fsync method. Can you please have another look and tell me what you think? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the extra iteration.
* considered as fsynced (the initialization value is {@code true}) when it is created; and writing new data to the cache file | ||
* will toggle the flag to {@code false}. | ||
**/ | ||
private final AtomicBoolean fsynced = new AtomicBoolean(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if calling this needsFsync
(and reverting the boolean) would be slightly more intuitive?
With the current name, a reader would be inclined to think the boolean is only true when the file is fsync'ed, but this is not true (it is also still true while writing and completing the gap).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it makes sense. I pushed d8fdaaa2aab25e78033906f2fa8c956a6e340441
.
Thanks Armin and Henning! I'm merging this only on master for now and will backport this change and the futures one about the persistent cache in one go. |
…iles (#64696) This committ changes the searchable snapshot's CacheService so that it now periodically fsync cache files using the method introduced in #64201. The synchronization is executed every 10 seconds by default (this interval can be changed using a new xpack.searchable.snapshot.cache.sync.interval setting).
elastic#64201) This commit adds a new CacheFile.fsync() method to fsync the cache file on disk. This method uses the utility method IOUtils.fsync(Path, boolean) that executes a FileChannel.force() call under the hood.
…iles (elastic#64696) This committ changes the searchable snapshot's CacheService so that it now periodically fsync cache files using the method introduced in elastic#64201. The synchronization is executed every 10 seconds by default (this interval can be changed using a new xpack.searchable.snapshot.cache.sync.interval setting).
…files #64696 (#66216) This committ changes the searchable snapshot's CacheService so that it now periodically fsync cache files using the method introduced in #64201. The synchronization is executed every 10 seconds by default (this interval can be changed using a new xpack.searchable.snapshot.cache.sync.interval setting). Backport of #64696 for 7.11
This pull request adds a new
CacheFile.fsync()
method to fsync the cache file on disk. This method uses the utility methodIOUtils.fsync(Path, boolean)
that executes aFileChannel.force()
call under the hood.