-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test/bench runners, benchmarks, additional scripts #752
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is to try a different design for testing, the goals are to make the test infrastructure a bit simpler, with clear stages for building and running, and faster, by avoiding rebuilding lfs.c n-times.
This moves defines entirely into the runtime of the test_runner, simplifying thing and reducing the amount of generated code that needs to be build, at the cost of limiting test-defines to uintmax_t types. This is implemented using a set of index-based scopes (created by test.py) that allow different layers to override defines from other layers, accessible through the global `test_define` function. layers: 1. command-line overrides 2. per-case defines 3. per-geometry defines
- Indirect index map instead of bitmap+sparse array - test_define_t and test_type_t - Added back conditional filtering - Added suite-level defines and filtering
- Added filtering based on suite, case, perm, type, geometry - Added --skip, --count, and --every (will be used for parallelism) - Implemented --list-defines - Better helptext for flags with arguments - Other minor tweaks
In the test-runner, defines are parameterized constants (limited to integers) that are generated from the test suite tomls resulting in many permutations of each test. In order to make this efficient, these defines are implemented as multi-layered lookup tables, using per-layer/per-scope indirect mappings. This lets the test-runner and test suites define their own defines with compile-time indexes independently. It also makes building of the lookup tables very efficient, since they can be incrementally populated as we expand the test permutations. The four current define layers and when we need to build them: layer defines predefine_map define_map user-provided overrides per-run per-run per-suite per-permutation defines per-perm per-case per-perm per-geometry defines per-perm compile-time - default defines compile-time compile-time -
- Added --disk/--trace/--output options for information-heavy debugging - Renamed --skip/--count/--every to --start/--stop/--step. This matches common terms for ranges, and frees --skip for being used to skip test cases in the future. - Better handling of SIGTERM, now all tests are killed, reported as failures, and testing is halted irregardless of -k. This is a compromise, you throw away the rest of the tests, which is normally what -k is for, but prevents annoying-to-terminate processes when debugging, which is a very interactive process.
- Expanded test defines to allow for lists of configurations These are useful for changing multi-dimensional test configurations without leading to extremely large and less useful configuration combinations. - Made warnings more visible durring test parsing - Add lfs_testbd.h to implicit test includes - Fixed issue with not closing files in ./scripts/explode_asserts.py - Add `make test_runner` and `make test_list` build rules for convenience
- Added internal tests, which can run tests inside other source files, allowing access to "private" functions and data Note this required a special bit of handling our defining and later undefining test configurations to not polute the namespace of the source file, since it can end up with test cases from different suites/configuration namespaces. - Removed unnecessary/unused permutation argument to generated test functions. - Some cleanup to progress output of test.py.
Previously test defines were implemented using layers of index-mapped uintmax_t arrays. This worked well for lookup, but limited defines to constants computed at compile-time. Since test defines themselves are actually calculated at _run-time_ (yeah, they have deviated quite a bit from the original, compile-time evaluated defines, which makes the name make less sense), this means defines can't depend on other defines. Which was limiting since a lot of test defines relied on defines generated from the geometry being tested. This new implementation uses callbacks for the per-case defines. This means they can easily contain full C statements, which can depend on other test defines. This does means you can create infinitely-recursive defines, but the test-runner will just break at run-time so don't do that. One concern is that there might be a performance hit for evaluating all defines through callbacks, but if there is it is well below the noise floor: - constants: 43.55s - callbacks: 42.05s
- Added --exec for wrapping the test-runner with external commands, such as Qemu or Valgrind. - Added --valgrind, which just aliases --exec=valgrind with a few extra flags useful during testing. - Dropped the "valgrind" type for tests. These aren't separate tests that run in the test-runner, and I don't see a need for disabling Valgrind for any tests. This can be added back later if needed. - Readded support for dropping directly into gdb after a test failure, either at the assert failure, entry point of test case, or entry point of the test runner with --gdb, --gdb-case, or --gdb-main. - Added --isolate for running each test permutation in its own process, this is required for associating Valgrind errors with the right test case. - Fixed an issue where explicit test identifier conflicted with per-stage test identifiers generated as a part of --by-suite and --by-case.
This mostly required names for each test case, declarations of previously-implicit variables since the new test framework is more conservative with what it declares (the small extra effort to add declarations is well worth the simplicity and improved readability), and tweaks to work with not-really-constant defines. Also renamed test_ -> test, replacing the old ./scripts/test.py, unfortunately git seems to have had a hard time with this.
This simplifies the interaction between code generation and the test-runner. In theory it also reduces compilation dependencies, but internal tests make this difficult.
A small mistake in test.py's control flow meant the failing test job would succesfully kill all other test jobs, but then humorously start up a new process to continue testing.
GCC is a bit annoying here, it can't generate .cgi files without generating the related .o files, though I suppose the alternative risks duplicating a large amount of compilation work (littlefs is really a small project). Previously we rebuilt the .o files anytime we needed .cgi files (callgraph info used for stack.py). This changes it so we always built .cgi files as a side-effect of compilation. This is similar to the .d file generation, though may be annoying if the system cc doesn't support --callgraph-info.
This also adds coverage support to the new test framework, which due to reduction in scope, no longer needs aggregation and can be much simpler. Really all we need to do is pass --coverage to GCC, which builds its .gcda files during testing in a multi-process-safe manner. The addition of branch coverage leverages information that was available in both lcov and gcov. This was made easier with the addition of the --json-format to gcov in GCC 9.0, however the lax backwards compatibility for gcov's intermediary options is a bit concerning. Hopefully --json-format sticks around for a while.
These scripts can't easily share the common logic, but separating field details from the print/merge/csv logic should make the common part of these scripts much easier to create/modify going forward. This also tweaked the behavior of summary.py slightly.
On one hand this isn't very different than the source annotation in gcov, on the other hand I find it a bit more readable after a bit of experimentation.
Also renamed GCI -> CI, this holds .ci files, though there is a risk of confusion with continuous integration. Also added unused but generated .ci files to clean rule.
- Renamed explode_asserts.py -> pretty_asserts.py, this name is hopefully a bit more descriptive - Small cleanup of the parser rules - Added recognization of memcmp/strcmp => 0 statements and generate the relevant memory inspecting assert messages I attempted to fix the incorrect column numbers for the generated asserts, but unfortunately this didn't go anywhere and I don't think it's actually possible. There is no column control analogous to the #line directive. I thought you might be able to intermix #line directives to put arguments at the right column like so: assert(a == b); __PRETTY_ASSERT_INT_EQ( #line 1 a, #line 1 b); But this doesn't work as preprocessor directives are not allowed in macros arguments in standard C. Unfortunately this is probably not possible to fix without better support in the language.
Yes this is more expensive, since small programs need to rewrite the whole block in order to conform to the block device API. However, it reduces code duplication and keeps all of the test-related block device emulation in lfs_testbd. Some people have used lfs_filebd/lfs_rambd as a starting point for new block devices and I think it should be clear that erase does not need to have side effects. Though to be fair this also just means we should have more examples of block devices...
On one hand this seems like the wrong place for these tests, on the other hand, it's good to know that the block device is behaving as expected when debugging the filesystem. Maybe this should be moved to an external program for users to test their block devices in the future?
The main change here from the previous test framework design is: 1. Powerloss testing remains in-process, speeding up testing. 2. The state of a test, included all powerlosses, is encoded in the test id + leb16 encoded powerloss string. This means exhaustive testing can be run in CI, but then easily reproduced locally with full debugger support. For example: ./scripts/test.py test_dirs#reentrant_many_dir#10#1248g1g2 --gdb Will run the test test_dir, case reentrant_many_dir, permutation #10, with powerlosses at 1, 2, 4, 8, 16, and 32 cycles. Dropping into gdb if an assert fails. The changes to the block-device are a work-in-progress for a lazily-allocated/copy-on-write block device that I'm hoping will keep exhaustive testing relatively low-cost.
With more features being added to test.py, the one-line status is starting to get quite long and pass the ~80 column readability heuristic. To make this worse this clobbers the terminal output when the terminal is not wide enough. Simple solution is to disable line-wrapping, potentially printing some garbage if line-wrapping-disable is not supported, but also printing a final status update to fix any garbage and avoid a race condition where the script would show a non-final status. Also added --color which disables any of this attempting-to-be-clever stuff.
Before this was available implicitly by supporting both rambd and filebd as backends, but now that testbd is a bit more complicated and no longer maps directly to a block-device, this needs to be explicitly supported.
These have no real purpose other than slowing down the simulation for inspection/fun. Note this did reveal an issue in pretty_asserts.py which was clobbering feature macros. Added explicit, and maybe a bit hacky, #undef _FEATURE_H to avoid this.
As expected this takes a significant amount of time (~10 minutes for all 1 powerlosses, >10 hours for all 2 powerlosses) but this may be reducible in the future by optimizing tests for powerloss testing. Currently test_files does a lot of work that doesn't really have testing value.
… fifos This mostly involved futzing around with some of the less intuitive parts of Unix's named-pipes behavior. This is a bit important since the tests can quickly generate several gigabytes of trace output.
Based on a handful of local hacky variations, this sort of trace rendering is surprisingly useful for getting an understanding of how different filesystem operations interact with the underlying block-device. At some point it would probably be good to reimplement this in a compiled language. Parsing and tracking the trace output quickly becomes a bottleneck with the amount of trace output the tests generate. Note also that since tracebd.py run on trace output, it can also be used to debug logged block-device operations post-run.
geky
added
needs minor version
new functionality only allowed in minor versions
tooling
labels
Dec 2, 2022
After only 4 days, 20 hours, with 144,437,889 powerlosses, the exhaustive powerloss testing with all 2-deep powerlosses finished successfully:
|
- Moved to Ubuntu 22.04 This notably means we no longer have to bend over backwards to install GCC 10! - Changed shell in gha to include the verbose/undefined flags, making debugging gha a bit less painful - Adopted the new test.py/test_runners framework, which means no more heavy recompilation for different configurations. This reduces the test job runtime from >1 hour to ~15 minutes, while increasing the number of geometries we are testing. - Added exhaustive powerloss testing, because of time constraints this is at most 1pls for general tests, 2pls for a subset of useful tests. - Limited coverage measurements to `make test` Originally I tried to maximize coverage numbers by including coverage from every possible source, including the more elaborate CI jobs which provide an extra level of fuzzing. But this missed the purpose of coverage measurements, which is to find areas where test cases can be improved. We don't want to improve coverage by just shoving more fuzz tests into CI, we want to improve coverage by adding specific, intentioned test cases, that, if they fail, highlight the reason for the failure. With this perspective, maximizing coverage measurement in CI is counter-productive. This changes makes it so the reported coverage is always less than actual CI coverage, but acts as a more useful metric. This also simplifies coverage collection, so that's an extra plus. - Added benchmarks to CI Note this doesn't suffer from inconsistent CPU performance because our benchmarks are based on purely simulated read/prog/erase measurements. - Updated the generated markdown table to include line+branch coverage info and benchmark results.
For long running processes (testing with >1pls) these logs can grow into multiple gigabytes, humorously we never access more than the last n lines as requested by --context. Piping the stdout with --stdout does not use additional RAM.
The littlefs CI is actually in a nice state that generates a lot of information about PRs (code/stack/struct changes, line/branch coverage changes, benchmark changes), but GitHub's UI has changed overtime to make CI statuses harder to find for some reason. This bot comment should hopefully make this information easy to find without creating too much noise in the discussion. If not, this can always be changed later.
changeprefix.py only works on prefixes, which is a bit of a problem for flags in the workflow scripts, requiring extra handling to not hide the prefix from changeprefix.py
Two flags introduced: -fcallgraph-info=su for stack analysis, and -ftrack-macro-expansions=0 for cleaner prettyassert.py warnings, are unfortunately not supported in Clang. The override vars in the Makefile meant it wasn't actually possible to remove these flags for Clang testing, so this commit changes those vars to normal, non-overriding vars. This means `make CFLAGS=-Werror` and `CFLAGS=-Werror make` behave _very_ differently, but this is just an unfortunate quirk of make that needs to be worked around.
- Renamed struct_.py -> structs.py again. - Removed lfs.csv, instead prefering script specific csv files. - Added *-diff make rules for quick comparison against a previous result, results are now implicitly written on each run. For example, `make code` creates lfs.code.csv and prints the summary, which can be followed by `make code-diff` to compare changes against the saved lfs.code.csv without overwriting. - Added nargs=? support for -s and -S, now uses a per-result _sort attribute to decide sort if fields are unspecified.
Mostly for benchmarking, this makes it easy to view and compare runner results similarly to other csv results.
The linear powerloss heuristic provides very good powerloss coverage without a significant runtime hit, so there's really no reason to run the tests without -Plinear. Previous behavior can be accomplished with an explicit -Pnone.
lfs_emubd_getreaded -> lfs_emubd_readed lfs_emubd_getproged -> lfs_emubd_proged lfs_emubd_geterased -> lfs_emubd_erased lfs_emubd_getwear -> lfs_emubd_wear lfs_emubd_getpowercycles -> lfs_emubd_powercycles
When you add a function to every benchmark suite, you know if should probably be provided by the benchmark runner itself. That being said, randomness in tests/benchmarks is a bit tricky because it needs to be strictly controlled and reproducible. No global state is used, allowing tests/benches to maintain multiple randomness stream which can be useful for checking results during a run. There's an argument for having global prng state in that the prng could be preserved across power-loss, but I have yet to see a use for this, and it would add a significant requirement to any future test/bench runner.
geky
force-pushed
the
test-and-bench-runners
branch
from
December 7, 2022 05:10
78ebd80
to
d677a96
Compare
…ground The difference between ggplot's gray and GitHub's gray was a bit jarring. This also adds --foreground and --font-color for this sort of additional color control without needing to add a new flag for every color scheme out there.
geky
force-pushed
the
test-and-bench-runners
branch
2 times, most recently
from
December 16, 2022 06:18
076f871
to
17c9665
Compare
Driven primarily by a want to compare measurements of different runtime complexities (it's difficult to fit O(n) and O(log n) on the same plot), this adds the ability to nest subplots in the same .svg which try to align as much as possible. This turned out to be surprisingly complicated. As a part of this, adopted matplotlib's relatively recent constrained_layout, which behaves much more consistently. Also dropped --legend-left, no one should really be using that.
As well as --legend* and --*ticklabels. Mostly for close feature parity, making it easier to move plots between plot.py and plotmpl.py.
geky
force-pushed
the
test-and-bench-runners
branch
from
December 16, 2022 22:47
17c9665
to
1f37eb5
Compare
- Added support for negative numbers in the leb16 encoding with an optional 'w' prefix. - Changed prettyasserts.py rule to .a.c => .c, allowing other .a.c files in the future. - Updated .gitignore with missing generated files (tags, .csv). - Removed suite-namespacing of test symbols, these are no longer needed. - Changed test define overrides to have higher priority than explicit defines encoded in test ids. So: ./runners/bench_runner bench_dir_open:0f1g12gg2b8c8dgg4e0 -DREAD_SIZE=16 Behaves as expected. Otherwise it's not easy to experiment with known failing test cases. - Fixed issue where the -b flag ignored explicit test/bench ids.
This allows debugging strategies such as binary searching for the point of "failure", which may be more complex than simply failing an assert.
geky
removed
the
needs minor version
new functionality only allowed in minor versions
label
Apr 26, 2023
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR brings in a number of changes to how littlefs is tested and measured.
Originally, the motivation was to add a method for benchmarking the filesystem, to lay the groundwork for future performance improvements, but the scope ended up growing to include a number of fixes/improvements to general littlefs testing.
Reworked test framework, no. 3
The test framework gets a rework again, taking what worked well in the current test framework and throwing out what doesn't.
The main goals behind this rework were to 1. simplify the framework, even if it means more boilerplate, as this should make it easier to extend with new features, and 2. run the tests as fast as possible.
Previously I've disregarded test performance, worried a focus on test performance risks complexity and difficulty in understanding the system that is being debugged, but my perspective is changing as faster tests => more tests => more confidence =>
the dark sidea safer filesystem. If you've told me previously to parallelize the tests, etc, this is the part where you can say you told me so.Tests incrementally compile, and we don't rebuild lfs.c for every suite
Previously the test's build system and runner was all self-contained in test.py. On one hand this meant you only needed test.py to build/run the tests, but on the other hand this design was confusing, limiting, and just all around problematic. One big issue was that, being outside of the build system, tests couldn't be built incrementally and every test suite needed a custom built version of lfs.c. This led to a slow debugging experience as each change to lfs.c needed at least 16 recompilations.
Now the test framework is integrated into the Makefile with separate build steps for applying prettyasserts.py and other scripts, all of which can be built incrementally, significantly reducing the time spent waiting for tests to recompile.
runners/test_runner is now its own standalone application
Previously any extra features/configuration had to be built into the test binaries during compilation. Now there is an explicit test_runner.c which can contain high-level test features that can be engaged at runtime through additional flags.
This makes it easier to add new test features, but also makes it easier to debug the test_runner itself, as it's no longer hidden inside test.py.
The actual tests are provided at link-time using a custom linker section, and are still generated by
./scripts/test.py -c
Tests now avoid spawning processes as much as possible
When you find a bug in C, it often leads to undefined behavior, memory corruption, etc, making the current test process no longer sound. But you also often want to keep running tests to see if there is any trends among the test failures. To accomplish this, the previous test framework ran each test in its own process.
Unfortunately, process spawning is not really a cheap operation. And with most tests not failing (hopefully), this ends up wasting a significant amount of time just spawning processes.
Now, with a more powerful test_runner, the test framework tries to run as many tests in a single process. Only spawning a new process when a test fails. This is all handled by scripts/test.py, which interacts with runners/test_runner, telling it which tests to run via the low-lever
--step
flag.Powerloss is now simulated with setjmp/longjmp
As a part of reducing process spawning, powerloss is directly simulated in the test_runner using setjmp/longjmp. Previously powerloss was simulated by killing and restarting the process. Which is a simple, heavy-handed solution that works. Slowly.
Since there can be thousands of powerlosses in a single test, this needed to be moved into the test_runner, especially since powerloss testing is arguably the most important feature of littlefs's test framework.
As an added plus, the simulated block-device no longer needs to be persisted in the host's filesystem when powerloss testing, and can stay comfortably in the test_runner's RAM. The cost of persisting the block-device could be mitigated by using a RAM-backed tmpfs disk, but this still incurred a cost as all block-device operations would need to go through the OS.
Using setjmp/longjmp can lead to memory leaks when reentrant tests call malloc, but since littlefs uses malloc in only a handful of convenience functions (littlefs's whole goal is minimal RAM after all), this doesn't seem to been a problem so far.
Tests now run in parallel
Perhaps the lowest-hanging fruit, tests now run in parallel.
The exact implementation here is a bit naive/suboptimal, giving each process n/m tests to run for n tests and m cores, but this keeps the process/thread management in the high-level test.py python layer, simplifying thread management and avoiding a multi-threaded test_runner.
The combination of the above improvements allows us to run the tests a lot faster, and/or cram in a lot more tests:
(Most of the new permutations are from moving the different test geometries out of CI and into the test_runner. Note the previous test framework does parallelize builds, which are included.)
Exhaustive powerloss testing
In addition to the heuristic-based powerloss testing, the new test_runner can also exhaustively search all possible powerloss scenarios for a given reentrant test.
To speed this up, the test_runner uses a simulated, copy-on-write block-device (reintroducing emubd), such that all possible code-paths in all possible powerloss scenarios are executed at most once. And, because most of the block-device's state can be shared via copy-on-write operations, each powerloss branch needs at most one additional block of memory in RAM.
The runtime still grows exponentially, and we each have a finite lifetime, so it will be more useful to exhaustively search a bounded number of powerlosses. Here's a run of all possible 5-deep powerlosses in the test_move test suite:
Since it can be a bit annoying to wait 15 minutes to reproduce a test failure, each powerloss scenario is encoded in a leb16 suffix appended to the current test identifier. This, combined with a leb16-encoding of the test's configuration and the test's name, can uniquely identify and reproduce any test run in the test_runner:
So once a failing test scenario is found, the exact state of the failure can be quickly reproduced for debugging:
Unfortunately, the current tests are not the most well designed for exhaustive powerloss testing. Some of them, test_files and test_interspersed specifically, write large files a byte at a time. Under exhaustive powerloss testing, these result in, well, a lot of powerlosses, but outside of the writes with data-structure changes, don't reveal anything interesting. This is something that can probably be improved over time.
Exhaustively testing all powerlosses at a depth of 1 takes 12.79 minutes with 84,484 total powerlosses.
Exhaustively testing all powerlosses at a depth of 2 takes at least 4 days, and is still running... I'll let you know when it finishes...
scripts/bench.py and runners/bench_runner
This PR introduces scripts/bench.py and runners/bench_runner, siblings to scripts/test.py and runners/test_runner, for measuring the performance of littlefs. Instead of reporting pass/fail, the bench_runner reports the total number of bytes read, programmed, and erased during a bench case. This can be useful for comparing different littlefs implementations, as these numbers map directly to hardware-dependent performance in IO-bound applications.
One feature that makes this useful, added to both the bench_runner and test_runner, is a flexible configuration system evaluated at runtime. This has the downside of limiting configurable bench/test defines to
uintmax_t
integers, but makes it easy to quickly test/compare/reproduce different configurations:At the moment I've only added a handful of benchmarks, though the number may increase in the future. The goal isn't to maintain a fully cohesive benchmark suite, as much as it is to have a set of tools for analyzing specific performance bottlenecks.
Reworked scripts/summary.py and other scripts to be a bit more flexible
This mainly means scripts/summary.py is no longer hard-wired to work with the compile-time measurements, allowing it to be used with other results such as benchmarks, though this comes with the cost of a large number of flags for controlling the output.
Each measurement script also now comes with a
*-diff
Makefile rule for quick comparisons.Reworked scripts/cov.py to take advantage of the --json-format introduced in GCC 9
It's a bit concerning that this was a breaking change in gcov's API, albeit on a major version, but the new --json-format is much easier to work with.
It's also worth noting this PR includes a change in ideology around coverage measurement. Instead of collecting coverage from as many sources as possible in CI, coverage is only collected from the central
make test
run. This will result in lower coverage numbers than previously, but these are the coverage numbers we actually care about: test coverage via easy to reproduce and isolate tests.This also simplifies coverage collection in CI, which is a plus.
scripts/perf.py and scripts/perfbd.py
perf.py was added as an experiment with Linux's perf tool, which uses an interest method of sampling performance counters to build an understanding of the performance of a system. Unfortunately this isn't the most useful measurement for littlefs, as we should expect littlefs's performance to be dominated by IO overhead. But it may still be useful for tracking down CPU bottlenecks
perfbd.py takes the ideas in Linux's perf tool and applies them to the bench_runner. Instead of sampling performance counters, we can sample littlefs's trace output to find low-level block-device operations. Combining this with stack traces provided by the backtrace function, we can propagate IO cost to their callers, building a useful map of the source of IO operations in a given benchmark run:
It's worth noting that these numbers are samples. They are a subset and don't add up to the total IO cost of the benchmark. But they are still useful as a metric for understand benchmark performance.
You could parse the entire trace output without sampling, but this would be quite slow and not really show you any more info.
scripts/plot.py and scripts/plotmpl.py
Added plot.py and plotmpl.py for quick plotting of littlefs measurements in the terminal and with Matplotlib. I think these will mostly be useful for looking for growth rates in benchmark results. And also future documentation.
scripts/tracebd.py, scripts/tailpipe.py, scripts/teepipe.py
These are some extra scripts for interacting with/viewing littlefs's trace output.
tailpipe.py and teepipe.py behave similarly to Unix's tail and tee programs, but work a bit better with Unix pipes, with resumability and fast paging.
The most interesting script is tracebd.py, which parses littlefs's trace output for block-device operations and renders it as ascii art. I've used this sort of block-device operation rendering previously for a quick demo and it can be surprisingly useful for understanding how filesystem operations interact with the block-device.
$ mkfifo trace $ ./scripts/bench.py ./runners/bench_runner bench_file_write -Gnor -DORDER=0 -DSIZE="range(0,24576,64)" -t trace ...
Changed lfs.a -> liblfs.a in default build rule
The
lib*
prefix is usually required by the linker, so I suspect this won't break anything. But it's worth mentioning this change in case someone relies on the current build target.Added a
make help
ruleI think I first saw this here, this self-documenting Makefile rule gives some of the useful Makefile rules a bit more discoverability.
Adopted script changes in CI, added a bot comment on PRs
Thanks to GitHub Actions, we have a lot of info about builds in CI. Unfortunately, statuses on GitHub have been becoming harder to find each UI change. To help keep this info discoverable I've added an automatically generated comment that @geky-bot should post after a succesful CI run. Hopefully this will contribute to PRs without being too annoying.
You can see some example comments on the PR I created on my test fork:
WIP NULL test pr geky/littlefs-test-repo#4
The increased testing did find a couple bugs: eba5553 and 0b11ce0. Their commit messages have more details on the bugs and their fixes. And with the new test identifiers I can tell you the exact state that will trigger the failures:
test_relocations_reentrant_renames:112gg261dk1e3f3:123456789abcdefg1h1i1j1k1l1m1n1o1p1q1r1s1t1u1v1g2h2i2j2k2l2m2n2o2p2q2r2s2t2
- eba5553 - found with linear heuristic powerlossestest_dirs_many_reentrant:2gg2cb:k4o6
- 0b11ce0 - found with 2-deep exhaustive powerlosses