Skip to content

Skip any erroring entry in FilesystemStore::list #3799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 28, 2025

Conversation

tnull
Copy link
Contributor

@tnull tnull commented May 24, 2025

Closes #3795.

Previously, we would bubble up any error that we'd encouter during retrieving the metadata for all listed entries. Here we relax this somewhat to allow for minor inconsistencies between reading the directory entries and checking whether they are valid key entries.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented May 24, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we want to do about MigratableKVStore::list_all_keys?

@TheBlueMatt
Copy link
Collaborator

I also feel like we need to specify the expected semantics at the API-level here. This change can allow for an inconsistent list view - eg if someone wrote key A then deleted key B the list API can return a value without A and without B, which is weird in some cases.

@tnull
Copy link
Contributor Author

tnull commented Jun 4, 2025

What do we want to do about MigratableKVStore::list_all_keys?

I don't think we should change anything about that, as for migration we'd want be as strict as possible? For one, users really shouldn't run migration in-parallel to anything else (which is basically already documented on migrate_kv_store_data), and if we hit some inconsistencies the user probably needs to take a look rather than the migration just silently continuing with some items missing.

I also feel like we need to specify the expected semantics at the API-level here. This change can allow for an inconsistent list view - eg if someone wrote key A then deleted key B the list API can return a value without A and without B, which is weird in some cases.

Are we sure this would be the case? I'm not sure if it is honestly. Before we change the docs, we should probably first try to understand better what exactly happened in the user-reported instance.

@TheBlueMatt
Copy link
Collaborator

Are we sure this would be the case? I'm not sure if it is honestly. Before we change the docs, we should probably first try to understand better what exactly happened in the user-reported instance.

I think so. We'd first list the directory entries that doesn't have A or B, then the other thread can remove A and write B and the loop will go through, detect that A is missing and ignore it, but never look for B. Absent a full restart of the whole loop and giving up after a while I'm not sure how we'd fix it, though, honestly.

@tnull
Copy link
Contributor Author

tnull commented Jun 5, 2025

I think so. We'd first list the directory entries that doesn't have A or B, then the other thread can remove A and write B and the loop will go through, detect that A is missing and ignore it, but never look for B. Absent a full restart of the whole loop and giving up after a while I'm not sure how we'd fix it, though, honestly.

Issues like this reminds me that we still need to provide a better default KVStore implementation and discourage FilesystemStore for production use, IMO.

@ldk-reviews-bot
Copy link

✅ Added second reviewer: @valentinewallace

@ldk-reviews-bot
Copy link

🔔 3rd Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tnull
Copy link
Contributor Author

tnull commented Jun 11, 2025

I also feel like we need to specify the expected semantics at the API-level here. This change can allow for an inconsistent list view - eg if someone wrote key A then deleted key B the list API can return a value without A and without B, which is weird in some cases.

Coming back to this: I don't think we'd want to change any KVStore specifications just because of an implementation detail of FilesystemStore. Where would you expect the specifics to be documented then? The trait implementation on FilesystemStore, even though this might not be super discoverable?

@tnull tnull requested a review from TheBlueMatt June 12, 2025 08:10
@TheBlueMatt
Copy link
Collaborator

Issues like this reminds me that we still need to provide a better default KVStore implementation and discourage FilesystemStore for production use, IMO.

The same issue applies in any KVStore once we're looking at multi-level issues.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we should be able to fix this by calling https://doc.rust-lang.org/std/fs/struct.DirEntry.html#method.file_type instead.

@ldk-reviews-bot
Copy link

🔔 4th Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 5th Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tnull
Copy link
Contributor Author

tnull commented Jun 16, 2025

Actually, we should be able to fix this by calling https://doc.rust-lang.org/std/fs/struct.DirEntry.html#method.file_type instead.

How come? IIUC, DirEntry::file_type is very often just a shortcut for calling DirEntry::metadata::file_type:

https://github.com/rust-lang/rust/blob/68ac5abb067806a88464ddbfbd3c7eec877b488d/library/std/src/sys/fs/unix.rs#L945-L969

So why would this reliably solve the issue at hand?

@TheBlueMatt
Copy link
Collaborator

Currently the code calls DirEntry::path() and then calls metadata on that Path, doing a syscall to fetch the metadata. We actually can call DirEntry::metadata() which avoids the syscall and will tell us what the file type was when the list operation happened, avoiding the race. It looks like file_type may also reduce the race further on some platforms, but, indeed, not on common ones.

@tnull
Copy link
Contributor Author

tnull commented Jun 16, 2025

Currently the code calls DirEntry::path() and then calls metadata on that Path, doing a syscall to fetch the metadata. We actually can call DirEntry::metadata() which avoids the syscall and will tell us what the file type was when the list operation happened, avoiding the race. It looks like file_type may also reduce the race further on some platforms, but, indeed, not on common ones.

Not sure if I follow here. Now pushed a fixup using DirEntry::metadata, but AFAICT, this still calls back into statx/fstatat64 (see https://github.com/rust-lang/rust/blob/68ac5abb067806a88464ddbfbd3c7eec877b488d/library/std/src/sys/fs/unix.rs#L918), so how is this not a syscall, and how is it not race-y?

@TheBlueMatt
Copy link
Collaborator

Oh, hmm, my memory was fuzzy on what readdir included. Nevermind, but we should use file_type directly. While it'll still be race-y on "solaris", "illumos", "haiku", "vxworks", "aix", "nto", "vita", and "some filesystems" (per readdir(3)), I don't even recognize a few of those and am pretty confident we'll never have users on most of them :).

Sadly, per readdir(3), it looks like we still need to handle the race better somehow, as "Currently, only some filesystems (among them: Btrfs, ext2, ext3, and ext4) have full support for returning the file type in d_type. All applications must properly handle a return of DT_UNKNOWN." Really we should have been naming the directories different from files, is it still too late to do that?

@TheBlueMatt
Copy link
Collaborator

Also, actually, list's docs are ambiguous to begin with. It says only "Returns a list of keys that are stored under the given secondary_namespace in primary_namespace." Why is it skipping directories to begin with?

@tnull
Copy link
Contributor Author

tnull commented Jun 16, 2025

Why is it skipping directories to begin with?

Because “directories” are a concept that is only known to the specific implementation that is FilesystemStore, which we decidedly didn’t include/mention in the API when we went with ‘namespaces’.

@TheBlueMatt
Copy link
Collaborator

Right, but generally they indicate the presence of a sub-namespace. Which presumably we should return per the docs on list, afaict?

@ldk-reviews-bot
Copy link

🔔 3rd Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tnull
Copy link
Contributor Author

tnull commented Jun 18, 2025

Right, but generally they indicate the presence of a sub-namespace. Which presumably we should return per the docs on list, afaict?

No, we discussed this several times in the past, and IIRC we can't just require that a) due to how our upgrade path from the original KVStorePersister looked like, but also b) since we so far don't require it and we don't know how exactly our users use these namespaces. For example, in VSS we don't have the notion of namespaces really, but simply build a single key that is primary_namespace#secondary_namespace#obfuscated_key and we list by key_prefix. If we'd suddenly require to return all sub-/namespaces also, we'd need to always list all keys and manually extract (hopefully also deobfuscate, once we do obfuscate) them, which would be very costly.

So, TLDR: yes, would have been great, but we didn't/couldn't do it this way, which is also why we had to introduce a separate list_all_keys method in MigratableKVStore (as otherwise we could have iteratively explored the entire keyspace based on the returned sub-/namespaces).

Really we should have been naming the directories different from files, is it still too late to do that?

See above: directories are a filesystem-only concept. And yes, technically we do name them differently in that we don't return them at all, which we can't easily change without breaking backwards compatibility.^^

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the biggest fan of adding a loop there as it might be unexpected for users that we just retry in this particular case.

Not sure how we can fix this issue without either (a) looping until we succeed or (b) changing the KVStore API so that we're allowed to return partial results in case of "tears" here. The current PR fixes the logic for ext4/btrfs, but not every FS, and eg I doubt it fixes it for c=.

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@tnull
Copy link
Contributor Author

tnull commented Jul 16, 2025

Not sure how we can fix this issue without either (a) looping until we succeed or (b) changing the KVStore API so that we're allowed to return partial results in case of "tears" here.

It really shows the that using the filesystem simply has some shortcomings and doesn't provide basic ACID guarantees, especially across different platforms. In my eyes neither a) nor b) is an option:

a) if we'd add a loop, we def. can't loop forever. So we'll have to add proper retry logic on our end, aborting operation after N retries. But, any magic number N here could lead to behavior that the user doesn't expect, especially if the API contract is that we bubble up IO errors. We'd need to make N configurable and document it, which is in turn an impossible API since no user will really understand what this parameter is for.

b) I don't think this is a good option either. KVStore has nothing to do with FilesystemStore, i.e., you can't just relax the API guarantees at the interface level after the fact here, just to address the shortcomings of a single implementation. FWIW, we're about to add a bunch of additional requirements for the async variant of KVStore, and given what we learnt about this issue here, I doubt a wrapped FilesystemStore will really hold up over time.

IMO we fixed what was immediately addressable in this PR, but the real fix will be to add fully-supported SQLite/Postgres backends to rust-lightning, and advise users to migrate to a proper database backend.

The current PR fixes the logic for ext4/btrfs, but not every FS, and eg I doubt it fixes it for c=.

They already implement retry logic around the FilesystemStore on their end, they just need to stop panicking on any error. (cc @domZippilli)

@domZippilli
Copy link
Contributor

Thanks for the ping.

the real fix will be to add fully-supported SQLite/Postgres backends to rust-lightning

If I get the time and mandate, something like this would be fun to write. Now that you all use rustfmt, I bet it wouldn't even get 500 comments.

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Jul 16, 2025

It really shows the that using the filesystem simply has some shortcomings and doesn't provide basic ACID guarantees, especially across different platforms.

This isn't true. We screwed up the directory structure and are hitting this issue because of it. If we put top-level keys in a directory instead we'd be just fine here. Plenty of large applications use filesystems as K-V stores and they generally are really good at it!

a) if we'd add a loop, we def. can't loop forever. So we'll have to add proper retry logic on our end, aborting operation after N retries. But, any magic number N here could lead to behavior that the user doesn't expect, especially if the API contract is that we bubble up IO errors.

Given we've hit this only recently in prod even with a large node that does high-latency persistence, I imagine a few loops will ~always work in practice unless there's something really bad going on. Indeed, we might want a limit, but that limit can be fairly high and I don't think we need to worry about that too much.

We'd need to make N configurable and document it, which is in turn an impossible API since no user will really understand what this parameter is for.

Or not?

b) I don't think this is a good option either. KVStore has nothing to do with FilesystemStore, i.e., you can't just relax the API guarantees at the interface level after the fact here, just to address the shortcomings of a single implementation.

I tend to agree. Hence why we should add a loop :)

FWIW, we're about to add a bunch of additional requirements for the async variant of KVStore, and given what we learnt about this issue here, I doubt a wrapped FilesystemStore will really hold up over time.

This is news to me? What requirements are we adding and why would they be problematic here?

They already implement retry logic around the FilesystemStore on their end, they just need to stop panicking on any error. (cc @domZippilli)

Which probably means we really need to take that logic upstream and add a loop?

@tnull tnull force-pushed the 2025-05-inconsistent-list branch from 1039663 to fba1095 Compare July 28, 2025 10:52
@tnull
Copy link
Contributor Author

tnull commented Jul 28, 2025

As mentioned elsewhere, I still think retry logic shouldn't necessarily be the concern of FilesystemStore, but I'll now concede as it's also not a super strongly held opinion. So now went ahead and added a loop / retry logic in FilesystemStore::list.

@tnull tnull requested a review from TheBlueMatt July 28, 2025 10:55
@tnull tnull force-pushed the 2025-05-inconsistent-list branch 4 times, most recently from 5d9b3b1 to 989cc6f Compare July 28, 2025 10:58
Copy link

codecov bot commented Jul 28, 2025

Codecov Report

❌ Patch coverage is 41.17647% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.93%. Comparing base (39e8d7d) to head (6681a69).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
lightning-persister/src/fs_store.rs 41.17% 6 Missing and 14 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3799      +/-   ##
==========================================
- Coverage   88.93%   88.93%   -0.01%     
==========================================
  Files         174      174              
  Lines      123876   123875       -1     
  Branches   123876   123875       -1     
==========================================
- Hits       110173   110170       -3     
- Misses      11250    11251       +1     
- Partials     2453     2454       +1     
Flag Coverage Δ
fuzzing 22.18% <ø> (+<0.01%) ⬆️
tests 88.76% <41.17%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TheBlueMatt
TheBlueMatt previously approved these changes Jul 28, 2025
@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

tnull added 2 commits July 28, 2025 14:25
Previously, the internal `dir_entry_is_key` method would take a path
argument, which would risk a race when the respective entry was modified
in the time between the original `fs::read_dir` call and the
`Path::metadata` call.

Here, we instead have it take a `DirEntry` argument, which--at least on
some platforms--should allow us to avoid this race as
`DirEntry::metadata` would return the original (cached) metadata.
Previously, we might hit a certain race condition between `fs::read_dir`
and actually accessing the directory entry in `FilesystemStore::list`.

Here, we introduce a `loop` retrying the listing (by default 10 times)
when we suddenly hit a `NotFound` error, to hopefully reach a consistent
view on the second/third time around.
@tnull
Copy link
Contributor Author

tnull commented Jul 28, 2025

Amended with the following changes:

diff --git a/lightning-persister/src/fs_store.rs b/lightning-persister/src/fs_store.rs
index 735b68165..5580382b3 100644
--- a/lightning-persister/src/fs_store.rs
+++ b/lightning-persister/src/fs_store.rs
@@ -36,5 +36,5 @@ const GC_LOCK_INTERVAL: usize = 25;
 // The number of times we retry listing keys in `FilesystemStore::list` before we give up reaching
 // a consistent view and error out.
-const LIST_DIR_CONSISTENCY_RETRIES: usize = 3;
+const LIST_DIR_CONSISTENCY_RETRIES: usize = 10;

 /// A [`KVStoreSync`] implementation that writes to and reads from the file system.
@@ -328,5 +328,5 @@ impl KVStoreSync for FilesystemStore {
                                match res {
                                        Ok(true) => {
-                                               let key = git_key_from_dir_entry_path(&p, &prefixed_dest)?;
+                                               let key = get_key_from_dir_entry_path(&p, &prefixed_dest)?;
                                                keys.push(key);
                                        },
@@ -405,5 +405,5 @@ fn dir_entry_is_key(dir_entry: &fs::DirEntry) -> Result<bool, lightning::io::Err
 }

-fn git_key_from_dir_entry_path(p: &Path, base_path: &Path) -> Result<String, lightning::io::Error> {
+fn get_key_from_dir_entry_path(p: &Path, base_path: &Path) -> Result<String, lightning::io::Error> {
        match p.strip_prefix(&base_path) {
                Ok(stripped_path) => {
@@ -469,5 +469,5 @@ impl MigratableKVStore for FilesystemStore {
                                let primary_namespace = String::new();
                                let secondary_namespace = String::new();
-                               let key = git_key_from_dir_entry_path(&primary_path, prefixed_dest)?;
+                               let key = get_key_from_dir_entry_path(&primary_path, prefixed_dest)?;
                                keys.push((primary_namespace, secondary_namespace, key));
                                continue 'primary_loop;
@@ -481,7 +481,7 @@ impl MigratableKVStore for FilesystemStore {
                                if dir_entry_is_key(&secondary_entry)? {
                                        let primary_namespace =
-                                               git_key_from_dir_entry_path(&primary_path, prefixed_dest)?;
+                                               get_key_from_dir_entry_path(&primary_path, prefixed_dest)?;
                                        let secondary_namespace = String::new();
-                                       let key = git_key_from_dir_entry_path(&secondary_path, &primary_path)?;
+                                       let key = get_key_from_dir_entry_path(&secondary_path, &primary_path)?;
                                        keys.push((primary_namespace, secondary_namespace, key));
                                        continue 'secondary_loop;
@@ -495,8 +495,8 @@ impl MigratableKVStore for FilesystemStore {
                                        if dir_entry_is_key(&tertiary_entry)? {
                                                let primary_namespace =
-                                                       git_key_from_dir_entry_path(&primary_path, prefixed_dest)?;
+                                                       get_key_from_dir_entry_path(&primary_path, prefixed_dest)?;
                                                let secondary_namespace =
-                                                       git_key_from_dir_entry_path(&secondary_path, &primary_path)?;
-                                               let key = git_key_from_dir_entry_path(&tertiary_path, &secondary_path)?;
+                                                       get_key_from_dir_entry_path(&secondary_path, &primary_path)?;
+                                               let key = get_key_from_dir_entry_path(&tertiary_path, &secondary_path)?;
                                                keys.push((primary_namespace, secondary_namespace, key));
                                        } else {

@tnull tnull requested a review from TheBlueMatt July 28, 2025 12:29
continue 'skip_entry;
},
Err(e) => {
if e.kind() == lightning::io::ErrorKind::NotFound && retries > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right because the dir_entry_is_key metadata() call gets its error mapped to an ErrorKind::Other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch, I had forgotten we started to do that at some point. Now added a commit reverting this, as remapping to ErrorKind::Other is nonsense, IMO, if we already have stronger typed io::Error.

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

If we already have an `io::Error`, we shouldn't remap to `Other`, as it
would have us lose type information.
@tnull tnull requested a review from TheBlueMatt July 28, 2025 17:08
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna go ahead and land this since its pretty straightforward, whether it suffices or not I dunno.

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@TheBlueMatt TheBlueMatt merged commit bb48bbc into lightningdevkit:main Jul 28, 2025
25 checks passed
@github-project-automation github-project-automation bot moved this from Goal: Merge to Done in Weekly Goals Jul 28, 2025
@tnull tnull mentioned this pull request Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

FilesystemStore::list is not safe to call concurrently
4 participants