Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bug:lfs_alloc will alloc one block repeatedly in multiple split #620

Merged
merged 1 commit into from
Mar 21, 2022

Conversation

XinStellaris
Copy link
Contributor

BUG CASE:Assume there are 6 blocks in littlefs, block 0,1,2,3 already allocated. 0 has a tail pair of {2, 3}. Now we try to write more into 0.
When writing to block 0, we will split(FIRST SPLIT), thus allocate block 4 and 5. Up to now , everything is as expected.
Then we will try to commit in block 4, during which split(SECOND SPLIT) is triggered again(In our case, some files are large, some are small, one split may not be enough). Still as expected now.
BUG happens when we try to alloc a new block pair for the second split:
As lookahead buffer reaches the end , a new lookahead buffer will be generated from flash content, and block 4, 5 are unused blocks in the new lookahead buffer because they are not programed yet. HOWEVER, block 4,5 should be occupied in the first split!!!!! The result is block 4,5 are allocated again(This is where things are getting wrong).

commit ce2c01f results in this bug. In the commit, a lfs_alloc_ack is inserted in lfs_dir_split, which will cause split to reset lfs->free.ack to block count.
In summary, this problem exists after 2.1.3.

Solution: don't call lfs_alloc_ack in lfs_dir_split.

xufan-xm referenced this pull request Dec 10, 2021
Sometimes small, single line code change hides behind it a complicated
story. This is one of those times.

If you look at this diff, you may note that this is a case of
lfs_dir_fetchmatch not correctly handling a tag that invalidates a
callback used to search for some condition, in this case a search for a
parent, which is invalidated by a later dir tag overwritting the
previous dir pair.

But how can this happen? Dir-pair-tags are only overwritten during
relocations (when a block goes bad or exceeds the block_cycles config
option for dynamic wear-leveling). Other dir operations create new
directory entries. And the only lfs_dir_fetchmatch condition that relies
on overwrites (as opposed to proper deletes) is when we need to find a
directory's parent, an operation that only occurs during a _different_
relocation. And a false _positive_, can only happen if we don't have a
parent. Which is really unlikely when we search for directory parents!

This bug and minimal test case was found by Matthew Renzelmann. In a
unfortunate series of events, first a file creation causes a directory
split to occur. This creates a new, orphaned metadata-pair containing
our new file. However, the revision count on this metadata-pair
indicates the pair is due for relocation as a part of wear-leveling.
Normally, this is fine, even though this metadata-pair has no parent,
the lfs_dir_find should return ENOENT and continue without error.
However, here we get hit by our fetchmatch bug. A previous, unrelated
relocation overwrites a pair which just happens to contain the block
allocated for a new metadata-pair. When we search for a parent,
lfs_dir_fetchmatch incorrectly finds this old, outdated metadata pair
and incorrectly tells our orphan it's found its parent.

As you can imagine the orphan's dissapointment must be immense.

So an unfortunately timed dir split triggers a relocation which
incorrectly finds a previously written parent that has been outdated
by another relocation.

As a solution we can outdate our found tag if it is overwritten by
an exact match during lfs_dir_fetchmatch.

As a part of this I started adding a new set of tests: tests/test_relocations,
for aggressive relocations tests. This is already by appended to by
another PR. I suspect relocations is relatively under-tested and is
becoming more important due to recent improvements in wear-leveling.
@geky geky added needs ci ci is probably broken needs investigation no idea what is wrong labels Mar 20, 2022
BUG CASE:Assume there are 6 blocks in littlefs, block 0,1,2,3 already allocated. 0 has a tail pair of {2, 3}. Now we try to write more into 0.
When writing to block 0, we will split(FIRST SPLIT), thus allocate block 4 and 5. Up to now , everything is as expected.
Then we will try to commit in block 4, during which split(SECOND SPLIT) is triggered again(In our case, some files are large, some are small, one split may not be enough).  Still as expected now.
BUG happens when we try to alloc a new block pair for the second split:
As lookahead buffer reaches the end , a new lookahead buffer will be generated from flash content, and block 4, 5 are unused blocks in the new lookahead buffer because they are not programed yet. HOWEVER, block 4,5 should be occupied in the first split!!!!!  The result is block 4,5 are allocated again(This is where things are getting wrong).

commit ce2c01f results in this bug. In the commit, a lfs_alloc_ack is inserted in lfs_dir_split, which will cause split to reset lfs->free.ack to block count.
In summary, this problem exists after 2.1.3.

Solution: don't call lfs_alloc_ack in lfs_dir_split.
@geky
Copy link
Member

geky commented Mar 21, 2022

Hi @XinStellaris, thanks for the PR and tracking down this bug. I'm really curious if this passes CI or if a different solution is needed. If it passes CI I'm happy to take it in.

Unfortunately due to some GitHub flakiness I had to touch your commit to trigger CI. Sorry if that causes any problems.


I'm also planning to improve how we test different block device geometries, we should have better testing of small block devices but it's difficult in the current test framework since all tests need to share a common block device.

@geky geky added next patch and removed needs investigation no idea what is wrong labels Mar 21, 2022
@geky
Copy link
Member

geky commented Mar 21, 2022

Looks like CI is happy with it, so this should be good to go. Thanks for the PR!

@geky geky removed the needs ci ci is probably broken label Mar 21, 2022
@geky geky changed the base branch from master to devel March 21, 2022 04:08
@geky geky merged commit 2d6f4ea into littlefs-project:devel Mar 21, 2022
@XinStellaris
Copy link
Contributor Author

XinStellaris commented Mar 21, 2022 via email

@geky
Copy link
Member

geky commented Mar 21, 2022

It should have triggered automatically, but GitHub disabled it for a time due to site-wide crypto-mining issues. CI should run automatically on all new PRs now again as long as the account is a certain amount of time old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants