-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix bug:lfs_alloc will alloc one block repeatedly in multiple split #620
Conversation
Sometimes small, single line code change hides behind it a complicated story. This is one of those times. If you look at this diff, you may note that this is a case of lfs_dir_fetchmatch not correctly handling a tag that invalidates a callback used to search for some condition, in this case a search for a parent, which is invalidated by a later dir tag overwritting the previous dir pair. But how can this happen? Dir-pair-tags are only overwritten during relocations (when a block goes bad or exceeds the block_cycles config option for dynamic wear-leveling). Other dir operations create new directory entries. And the only lfs_dir_fetchmatch condition that relies on overwrites (as opposed to proper deletes) is when we need to find a directory's parent, an operation that only occurs during a _different_ relocation. And a false _positive_, can only happen if we don't have a parent. Which is really unlikely when we search for directory parents! This bug and minimal test case was found by Matthew Renzelmann. In a unfortunate series of events, first a file creation causes a directory split to occur. This creates a new, orphaned metadata-pair containing our new file. However, the revision count on this metadata-pair indicates the pair is due for relocation as a part of wear-leveling. Normally, this is fine, even though this metadata-pair has no parent, the lfs_dir_find should return ENOENT and continue without error. However, here we get hit by our fetchmatch bug. A previous, unrelated relocation overwrites a pair which just happens to contain the block allocated for a new metadata-pair. When we search for a parent, lfs_dir_fetchmatch incorrectly finds this old, outdated metadata pair and incorrectly tells our orphan it's found its parent. As you can imagine the orphan's dissapointment must be immense. So an unfortunately timed dir split triggers a relocation which incorrectly finds a previously written parent that has been outdated by another relocation. As a solution we can outdate our found tag if it is overwritten by an exact match during lfs_dir_fetchmatch. As a part of this I started adding a new set of tests: tests/test_relocations, for aggressive relocations tests. This is already by appended to by another PR. I suspect relocations is relatively under-tested and is becoming more important due to recent improvements in wear-leveling.
BUG CASE:Assume there are 6 blocks in littlefs, block 0,1,2,3 already allocated. 0 has a tail pair of {2, 3}. Now we try to write more into 0. When writing to block 0, we will split(FIRST SPLIT), thus allocate block 4 and 5. Up to now , everything is as expected. Then we will try to commit in block 4, during which split(SECOND SPLIT) is triggered again(In our case, some files are large, some are small, one split may not be enough). Still as expected now. BUG happens when we try to alloc a new block pair for the second split: As lookahead buffer reaches the end , a new lookahead buffer will be generated from flash content, and block 4, 5 are unused blocks in the new lookahead buffer because they are not programed yet. HOWEVER, block 4,5 should be occupied in the first split!!!!! The result is block 4,5 are allocated again(This is where things are getting wrong). commit ce2c01f results in this bug. In the commit, a lfs_alloc_ack is inserted in lfs_dir_split, which will cause split to reset lfs->free.ack to block count. In summary, this problem exists after 2.1.3. Solution: don't call lfs_alloc_ack in lfs_dir_split.
Hi @XinStellaris, thanks for the PR and tracking down this bug. I'm really curious if this passes CI or if a different solution is needed. If it passes CI I'm happy to take it in. Unfortunately due to some GitHub flakiness I had to touch your commit to trigger CI. Sorry if that causes any problems. I'm also planning to improve how we test different block device geometries, we should have better testing of small block devices but it's difficult in the current test framework since all tests need to share a common block device. |
Looks like CI is happy with it, so this should be good to go. Thanks for the PR! |
Thanks for the help of triggering CI for me. I did forget to trigger it.
I am glad to hear that my bug fix has passed.
…---Original---
From: "Christopher ***@***.***>
Date: Mon, Mar 21, 2022 12:05 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [littlefs-project/littlefs] fix bug:lfs_alloc will alloc oneblock repeatedly in multiple split (PR #620)
Looks like CI is happy with it, so this should be good to go. Thanks for the PR!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
It should have triggered automatically, but GitHub disabled it for a time due to site-wide crypto-mining issues. CI should run automatically on all new PRs now again as long as the account is a certain amount of time old. |
BUG CASE:Assume there are 6 blocks in littlefs, block 0,1,2,3 already allocated. 0 has a tail pair of {2, 3}. Now we try to write more into 0.
When writing to block 0, we will split(FIRST SPLIT), thus allocate block 4 and 5. Up to now , everything is as expected.
Then we will try to commit in block 4, during which split(SECOND SPLIT) is triggered again(In our case, some files are large, some are small, one split may not be enough). Still as expected now.
BUG happens when we try to alloc a new block pair for the second split:
As lookahead buffer reaches the end , a new lookahead buffer will be generated from flash content, and block 4, 5 are unused blocks in the new lookahead buffer because they are not programed yet. HOWEVER, block 4,5 should be occupied in the first split!!!!! The result is block 4,5 are allocated again(This is where things are getting wrong).
commit ce2c01f results in this bug. In the commit, a lfs_alloc_ack is inserted in lfs_dir_split, which will cause split to reset lfs->free.ack to block count.
In summary, this problem exists after 2.1.3.
Solution: don't call lfs_alloc_ack in lfs_dir_split.