Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update littlefs module to v2.5 #12

Merged
merged 64 commits into from
Sep 5, 2022
Merged

update littlefs module to v2.5 #12

merged 64 commits into from
Sep 5, 2022

Conversation

fabiobaltieri
Copy link
Member

Hi, this is a merge of the v2.5.0 tag from https://github.com/littlefs-project/littlefs into the local module, updating from v2.4.

tniessen and others added 30 commits June 14, 2021 12:12
When the on-disk block size doesn't match the config block size, it is
possible to get file corruption. For instance, if the num blocks was
0x200 and we re-mount with 0x100 files could be corrupt.

If we re-mount with a larger number of blocks things should be safer,
but could be handled with a resize option or perhaps a mount flag to
ignore this parameter.
With the previous commit, fail if the superblock block_size doesn't
match the config block_size.
This fixes warnings produced by tools such as memcheck without
requiring the user to set an erase value.
This makes the bit breakdown clearer.
Now with -s/--sort and -S/--reverse-sort for sorting the functions by
size.

You may wonder why add reverse-sort, since its utility doesn't seem
worth the cost to implement (these are just helper scripts after all),
the reason is that reverse-sort is quite useful on the command-line,
where scrollback may be truncated, and you only care about the larger
entries.

Outside of the command-line, normal sort is prefered.

Fortunately the difference is just the sign in the sort key.

Note this conflicts with the short --summary flag, so that has been
removed.
This is to avoid unexpected script behavior even though data.py should
always return 0 bytes for littlefs. Maybe a check for this should be
added to CI?
scripts/coverage.py was missed originally because it's not ran as often
as the others. Since it requires run-time info, it's usually only used
in CI.
Note this detects loops (recursion), and renders this as infinity.
Currently littlefs does have a single recursive function and you can see
how this infects the full call graph. Eventually this should be removed.
- size  -> code_size
- size  -> data_size
- frame -> stack_frame
- limit -> stack_limit
- hits  -> coverage_hits
- count -> coverage_count
Note this does include internal structs, so this should probably
be limited to informative purposes.
This required a patch to the --diff flag for the scripts to ignore
a missing file. This enables the useful one liner for making comparisons
with potentially missing previous versions:

    ./scripts/code.py lfs.o -d lfs.o.code.csv -o lfs.o.code.csv

    function (0 added, 0 removed)            old     new    diff
    TOTAL                                  25476   25476      +0

One downside, these previous files are easy to delete as a part of make
clean, which limits their usefulness for comparing configuration
changes...
- Added -L/--depth argument to show dependencies for scripts/stack.py,
  this replaces calls.py
- Additional internal restructuring to avoid repeated code
- Removed incorrect diff percentage when there is no actual size
- Consistent percentage rendering in test.py
A full summary of static measurements (code size, stack usage, etc) can now
be found with:

    make summary

This is done through the combination of a new ./scripts/summary.py
script and the ability of existing scripts to merge into existing csv
files, allowing multiple results to be merged either in a pipeline, or
in parallel with a single ./script/summary.py call.

The ./scripts/summary.py script can also be used to quickly compare
different builds or configurations. This is a proper implementation
of a similar but hacky shell script that has already been very useful
for making optimization decisions:

    $ ./scripts/structs.py new.csv -d old.csv --summary
    name (2 added, 0 removed)               code             stack            structs
    TOTAL                                  28648 (-2.7%)      2448               1012

Also some other small tweaks to scripts:

- Removed state saving diff rules. This isn't the most useful way to
  handle comparing changes.

- Added short flags for --summary (-Y) and --files (-F), since these
  are quite often used.
- Changed `make summary` to show a one line summary
- Added `make lfs.csv` rule, which is useful for finding more info with
  other scripts
- Fixed small issue in ./scripts/summary.py
- Added *.ci (callgraph) and *.csv (script output) to CI
- Added to GitHub statuses (61 results)

- Reworked generated release table to include these (16 results, only thumb)

These also required a surprisingly large number of other changes:

- Bumbed CI Ubuntu version 18.04 -> 20.04, 22.04 is already on the
  horizon but not usable in GitHub yet

- Manualy upgrade to GCC v10, this is required for the -fcallgraph-info
  flag that scripts/stack.py uses.

- Increased paginated status queries to 100 per-page. If we have more
  statuses than this the status diffs may get much more complicated...

- Forced whitespace in generated release table to always be nbsp. GitHub
  tables get scrunched rather ugly without this, prefering margins to
  readable tables.

- Added limited support for "∞" results, since this is returned by
  ./scripts/stack.py for recursive functions.

As a side-note, this increases the number of statuses reported
per-commit from 6 to 61, so hopefully that doesn't cause any problems...
Using errors=replace in python utf-8 decoding makes these scripts more
resilient to underlying errors, rather than just throwing an unhelpfully
generic decode error.
GCC is a bit frustrating here, it really wants to generate every file in
a single command, which _is_ more efficient if our build system could
leverage this. But -fcallgraph-info is a rather novel flag, so we can't
really rely on it for generally compiling and testing littlefs.

The multi-file output gets in the way when we want an explicitly
separate rule for callgraph-info generation. We can't generate the
callgraph-info without generating the objects files.

This becomes a surprsing issue when parallel building (make -j) is used!
Suddenly we might end up with both the .o and .ci rules writing to .o
files, which creates a really difficult to track down issue of corrupted
.o files.

The temporary solution is to use an order-only prerequisite. This still
ends up building the .o files twice, but it's an acceptable tradeoff for
not requiring the -fcallgraph-info for all builds.
This requires parsing an additional section of the dwarfinfo (--dwarf=rawlines)
to get the declaration file info.

---

Interpreting the results of ./scripts/structs.py reporting is a bit more
complicated than other scripts, structs aren't used in a consistent
manner so the cost of a large struct depends on the context in which it
is used.

But that being said, there really isn't much reason to report
internal-only structs. These structs really only exist for type-checking
in internal algorithms, and their cost will end up reflected in other RAM
measurements, either stack, heap, or other.
…ntu snaps

Introduced when updating CI to Ubuntu 20.04, Ubuntu snaps consume
loop devices, which conflict with out assumption that /dev/loop0
will always be unused. Changed to request a dynamic loop device from
losetup, though it would have been nice if Ubuntu snaps allocated
from the last device or something.
Avoids redundant counting of structs shared in multiple .c files, which
is very common. This is different from the other scripts,
code.py/data.py/stack.py, but this difference makes sense as struct
declarations have a very different lifetime.
As noted in Python's subprocess library:

> This will deadlock when using stdout=PIPE and/or stderr=PIPE and the
> child process generates enough output to a pipe such that it blocks
> waiting for the OS pipe buffer to accept more data.

Curiously, this only became a problem when updating to Ubuntu 20.04
in CI (python3.6 -> python3.8).
lfs_dir_commit originally relied heavily on tail-recursion, though at
least one path (through relocations) was not tail-recursive, and could
cause unbounded stack usage in extreme cases of bad blocks. (Keep in
mind even extreme cases of bad blocks should be in scope for littlefs).

In order to remove recursion from this code path, several changed were
raequired:

- The lfs_dir_compact logic had to be somewhat inverted. Instead of
  first compacting and then resolving issues such as relocations and
  orphans, the overarching lfs_dir_commit now contains a state-machine
  which after committing or compacting handles the extra changes to the
  filesystem in a single, non-recursive loop

- Instead of fixing all relocations recursively, >1 relocation requires
  defering to a full deorphan step. This step is unfortunately an
  additional n^2 process. It also required some changes to lfs_deorphan
  in order to ignore intentional orphans created as an intermediary in
  lfs_mkdir. Maybe in the future we should remove these.

- Tail recursion normally found in lfs_fs_deorphan had to be rewritten
  as a loop which restarts any time a new commit causes a relocation.
  This does show that the algorithm may not terminate, but only if every
  block is bad, which will eventually cause littlefs to run out of
  blocks to write to.
This mostly just required separate functions for "lfs_file_rawwrite" and
"lfs_file_flushedwrite", since lfs_file_flush recursively invokes
lfs_file_rawread and lfs_file_rawwrite.

This comes at a code cost, but gives us bounded and measurable RAM usage
on this code path.
lfs_dir_traverse is a bit unpleasant in that it is inherently a
recursive function, but without a strict bound of 4 calls (commit -> filter ->
move -> filter), and efforts to unroll the recursion comes at a
signification code cost.

It turns out the best solution I've found so far is to simple create an
explicit stack with an explicit bound of 4 calls (or more accurately,
3 pushed frames).

---

This actually highlights one of the bigger flaws in littlefs right now,
which is that this function, lfs_dir_traverse, takes O(n^2) disk reads
to traverse.

Note that LFS_FROM_MOVE can only occur once per commit, which is why
this code is O(n^2) and not O(n^4).
colin-foster-in-advantage and others added 18 commits April 10, 2022 12:53
- nit: Moving brace to end of if statement line for consistency
- mount: add more debug info per CR
- Fix compiler error from extra parentheses
- Fix superblock typo
Rename can be VERY time consuming. One of the reasons is the 4 recursion
level depth of lfs_dir_traverse() seen if a compaction happened during the
rename.

lfs_dir_compact()
  size computation
    [1] lfs_dir_traverse(cb=lfs_dir_commit_size)
         - do 'duplicates and tag update'
       [2] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1])
           - Reaching a LFS_FROM_MOVE tag (here)
         [3] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) <= on 'from' dir
             - do 'duplicates and tag update'
           [4] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[3])
  followed by the compaction itself:
    [1] lfs_dir_traverse(cb=lfs_dir_commit_commit)
         - do 'duplicates and tag update'
       [2] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1])
           - Reaching a LFS_FROM_MOVE tag (here)
         [3] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) <= on 'from' dir
             - do 'duplicates and tag update'
           [4] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[3])

Yet, analyse shows that levels [3] and [4] don't perform anything
if the callback is lfs_dir_traverse_filter...

A practical example:

- format and mount a 4KB block FS
- create 100 files of 256 Bytes named "/dummy_%d"
- create a 1024 Byte file "/test"
- rename "/test" "/test_rename"
- create a 1024 Byte file "/test"
- rename "/test" "/test_rename"
This triggers a compaction where lfs_dir_traverse was called 148393 times,
generating 25e6+ lfs_bd_read calls (~100 MB+ of data)

With the optimization, lfs_dir_traverse is now called 3248 times
(589e3 lfs_bds_calls (~2.3MB of data)

=> x 43 improvement...
…nt_fail

Fail mount when the block size changes
don't use lfs_file_open() when LFS_NO_MALLOC is set
add the limit, the cursor cannot be set to a negative number
Fix lfs_file_rawseek performance issue
Removed invalid overwrite for return value.
There are two issues, when using the file-based block device emulation
on Windows Platforms:
1. There is no fsync implementation available. This needs to be mapped
   to a Windows-specific FlushFileBuffers system call.
2. The block device file needs to be opened as binary file (O_BINARY)
	   The corresponding flag is not required for Linux.
Fixes to use lfs_filebd on windows platforms
Restructure littlefs to not use recursion, measure stack usage
This is possible thanks to invoxiaamo's optimization of compacting
renames to avoid the O(n^3) nested filters. Not only does this
significantly reduce the runtime cost of that operation, but it
reduces the maximum possible depth of recursion to 3 frames.

Deepest lfs_dir_traverse before:

traverse with commit
'-> traverse with filter
    '-> traverse with move
        '-> traverse with filter

Deepest lfs_dir_traverse after:

traverse with commit
'-> traverse with move
    '-> traverse with filter
…to zephyr

Update the local littlefs copy to v2.5.0. Straight merge of the v2.5.0
tag, had to re-delete the .github files and solve a trivial conflict in
lfs.c.

Signed-off-by: Fabio Baltieri <fabiobaltieri@google.com>
@stephanosio
Copy link
Member

There seems to be a merge conflict.

@fabiobaltieri
Copy link
Member Author

There seems to be a merge conflict.

Are you trying to merge the PR with the GitHub UI? There was one and it should have been sorted in the the merge commit I used at the top of the PR, I think it may have to be pushed manually. Should I give it a try?

@fabiobaltieri fabiobaltieri merged commit ca583fd into zephyrproject-rtos:zephyr Sep 5, 2022
@fabiobaltieri fabiobaltieri deleted the v2.5-uprev branch September 5, 2022 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.