tool/autopilot: mark incompletely executed tests #5432

rite · 2025-01-22T10:53:32Z

As described in tool/autopilot, "The autopilot utility automates the process of executing multiple run scripts on different test targets". For exactly this purpose it is also used by Gapfruit since quite some time and helped us to simplify the definition of the continuous integration pipeline. This is a proposal to further streamline the execution of automated tests.

Let me start with an example:

$ tool/autopilot -t x86_64-linux-linux -p x86_64-pc -k nova -k hw \
                  -r nova -r lx_fs_import -r smbios_decoder -r init_smp -r framebuffer

The above command produces a test summary like:

--- architecture x86_64 ---
linux linux:                   nova                           -> UNAVAILABLE
pc hw:                         nova                           -> UNAVAILABLE
pc nova:                       nova                           -> OK (0:17)
linux linux:                   lx_fs_import                   -> OK (0:01)
pc hw:                         lx_fs_import                   -> OK (0:00)
pc nova:                       lx_fs_import                   -> OK (0:00)
linux linux:                   smbios_decoder                 -> OK (0:00)
pc hw:                         smbios_decoder                 -> OK (0:06)
pc nova:                       smbios_decoder                 -> OK (0:04)
linux linux:                   init_smp                       -> OK (0:02)
pc hw:                         init_smp                       -> OK (0:01)
pc nova:                       init_smp                       -> OK (0:01)
linux linux:                   framebuffer                    -> OK (0:05)
pc hw:                         framebuffer                    -> OK (0:09)
pc nova:                       framebuffer                    -> OK (0:07)
--- done (everything ok) ---

Let's have a look at the executed run scripts individually:

run/nova: Obviously, executing this run script on base-linux or base-hw doesn't make sense. Correspondingly, it is marked as "UNAVAILABLE".
run/lx_fs_import: Similarly, lx_fs_import is not supposed to work on base-hw and base-nova. However, it is marked as "OK". Inspecting the respective log files reveals that the "`Test requires 'linux'".
run/smbios_decoder: Again for obvious reasons, smbios_decoder is not supposed to work on base-linux, yet it is marked as "OK". Inspecting the respective log files reveals that the "Run script is only supported on hw/pc and nova/pc"
run/init_smp: Naively, I would assume this run script is supposed to work on base-hw and base-nova, but maybe not on base-linux. Inspecting the log files reveals that "Run script does not support Qemu". On real X86-hardware, the tests would be executed on base-nova but not on base-hw, because of the assert_spec nova-statement in the run script. The autopilot summary doesn't give any information in this regard: in any case, the execution of the run script is marked as "OK".
framebuffer: This run script does run_genode_until forever and is not supposed to be executed using the autopilot. Therefore it isn't in the list of automatically executed test cases (tool/autopilot.list).

One of the main goals of this proposal is that the autopilot summary includes the information whether a test has actually been executed or skipped/aborted. In order to archive that, run scripts have to declare their requirements explicitly and in a somewhat formalized/streamlined manner. As a side effect, this also simplifies to search (grep) for run scripts that – for example – aren't executed in autopilot-mode or won't be executed on a certain kernel. Further, this allows to pass all available run scripts to the autopilot and it's not necessary – while still possible – to manually curate a list of tests to be executed. The test pipeline can be defined in a way that newly added run scripts will automatically be executed, unless it is explicitly stated otherwise in the run script itself.

Without changing any run scripts, the test summary with commit d631b42 looks like:

--- architecture x86_64 ---
linux linux:                   nova                           -> UNAVAILABLE
pc hw:                         nova                           -> UNAVAILABLE
pc nova:                       nova                           -> OK (0:16)
linux linux:                   lx_fs_import                   -> OK (0:01)
pc hw:                         lx_fs_import                   -> ABORT (0:00): unsatisfied requirement
pc nova:                       lx_fs_import                   -> ABORT (0:00): unsatisfied requirement
linux linux:                   smbios_decoder                 -> OK (0:00)
pc hw:                         smbios_decoder                 -> OK (0:06)
pc nova:                       smbios_decoder                 -> OK (0:04)
linux linux:                   init_smp                       -> ABORT (0:02): unsatisfied requirement
pc hw:                         init_smp                       -> OK (0:01)
pc nova:                       init_smp                       -> OK (0:01)
linux linux:                   framebuffer                    -> OK (0:05)
pc hw:                         framebuffer                    -> OK (0:07)
pc nova:                       framebuffer                    -> OK (0:06)
--- done (everything ok) ---

Commit 0dd3bf1 introduces a "generalized" assert-procedure that can be used by a run script to express its requirements. The two remaining commits of this branch adapt run/framebuffer and run/smbios_decode to make use of this assert-procedure.

What do you think about the general direction of this proposal (apart from the actual implementation: for example naming the procedure assert feels "a bit too global")?

The text was updated successfully, but these errors were encountered:

...with "ABORT", in case of an unmet 'assert_spec' in the run scenario. Issue genodelabs#5432

Issue genodelabs#5432

...to check if the requirements for the run scenario are satisfied. By placing the assertion just before 'run_genode_until forever', the automated tests still validate the 'init'-configuration and verify that the components build successfully. Issue genodelabs#5432

...to check if the requirements for the run scenario are satisfied. Issue genodelabs#5432

nfeske · 2025-01-30T13:22:06Z

Thank you @rite for the excellent problem statement and the sensible suggestion for improving the tooling. Even though I haven't yet had a closer look at your commits, I very much appreciate the direction. It will certainly improve the quality of data that we at Genode Labs gather from our CI.

I'm looking forward to review your commits in detail once I have wrapped up my current line of work (#5428).

nfeske · 2025-02-04T14:15:09Z

Now - after having looked at your commits - I like it even more. Let's go for it!

naming the procedure assert feels "a bit too global" [?]

The naming is just right. It is intuitive and void of ceremony.

What do you think of how we should go about updating the existing run scripts?

rite · 2025-02-05T12:47:39Z

Thanks for the affirmative feedback.

I also like to rename the procedure get_cmd_switch to cmd_switch along with adding the procedure have_cmd_switch. I think this blends in nicely with the now omnipresent pattern with have_[spec|board|installed|include]. Further, the pretty common assert ![have_cmd_switch --autopilot] "feels better" than assert ![get_cmd_switch --autopilot]. Do you approve?

I suggest that I:

(optionally) Provide a commit that introduces cmd_switch (a copy of get_cmd_switch) and have_cmd_switch.
Provide commits that each updates the run scripts on a "per repo"-basis, including replacing calls of assert_spec with assert [have_spec].
When these changes made it to master, external repos like genode-world, genode-imx, gapfruit-community, etc. may have to be updated accordingly.
Provide a commit that removes the procedures assert_spec and get_cmd_switch from tool/run/run.

I expect that almost everything is trivial, but I miss the infrastructure to test all changes myself. And my focus is on testing with base-linux, base-nova, base-hw.

nfeske · 2025-02-05T13:33:57Z

Do you approve?

I don't merely approve, but very much appreciate it!

Thanks for the generous plan, which is much more than I hoped for. Hats off for your willingness to adjust the run scripts. It goes without saying that we will do our share of testing and (potentially) fixing collateral effects that we'll observe in our CI.

This procedure is useful for run scripts that depend for example on a board support package that is provided only for certain boards, possibly by a third-party repo. Issue genodelabs#5432

Renaming 'get_cmd_switch' and 'get_cmd_arg' to 'have_cmd_arg' and 'cmd_arg' respectively blends in nicely with the now omnipresent pattern with 'have_[spec|board|installed|include]'. This commit deprecates 'get_cmd_switch' and 'get_cmd_arg', which will be removed in a later commit. Issue genodelabs#5432

...to be consistent with the naming in tool/run. Related to genodelabs#5432

...with "ABORT", in case of an unmet 'assert_spec' in the run scenario. Issue genodelabs#5432

Issue genodelabs#5432

rite · 2025-02-11T15:41:12Z

I rebased on current master and pushed a refined implementation of the proposed changes in "tool" to my feature branch, together with a commit that updates the run scripts in the repo "os".

I added some notes/questions as comments directly to the diff.

nfeske · 2025-02-13T14:17:41Z

Splendid!

Issue genodelabs#5432

rite · 2025-02-13T20:26:14Z

Thank you! I pushed three fixup commits regarding your feedback. Some get_cmd_switch- and assert_spec-calls that slipped through initially are now also fixed.

rite added a commit to rite/genode that referenced this issue Jan 22, 2025

tool/autopilot: mark incompletely executed tests

d631b42

...with "ABORT", in case of an unmet 'assert_spec' in the run scenario. Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Jan 22, 2025

tool/run: introduce generalized 'assert'-proc

0dd3bf1

Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Jan 22, 2025

run/smbios_decoder: use generalized 'assert'-proc

4b071e3

...to check if the requirements for the run scenario are satisfied. Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 11, 2025

tool/dts: rename 'get_cmd_[arg|switch]'-proc

d2e697e

...to be consistent with the naming in tool/run. Related to genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 11, 2025

tool/autopilot: rename 'get_cmd_[arg|switch]'-proc

b63284c

...to be consistent with the naming in tool/run. Related to genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 11, 2025

tool/autopilot: mark incompletely executed tests

105d1cf

...with "ABORT", in case of an unmet 'assert_spec' in the run scenario. Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 11, 2025

tool/run: introduce generalized 'assert'-proc

a337ec0

Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 11, 2025

os: use 'assert'-proc in run scripts

552d88a

Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 11, 2025

fixup! tool/run: introduce generalized 'assert'-proc

817403a

Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 11, 2025

os: use 'assert'-proc in run scripts

e19c503

Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 13, 2025

fixup! tool/run: introduce generalized 'assert'-proc

da3b393

Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 13, 2025

fixup! os: use 'assert'-proc in run scripts

453afc5

Issue genodelabs#5432

rite added a commit to rite/genode that referenced this issue Feb 13, 2025

fixup! os: use 'assert'-proc in run scripts

d800a2f

Issue genodelabs#5432

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tool/autopilot: mark incompletely executed tests #5432

tool/autopilot: mark incompletely executed tests #5432

rite commented Jan 22, 2025 •

edited

Loading

nfeske commented Jan 30, 2025

nfeske commented Feb 4, 2025

rite commented Feb 5, 2025

nfeske commented Feb 5, 2025

rite commented Feb 11, 2025 •

edited

Loading

nfeske commented Feb 13, 2025

rite commented Feb 13, 2025

tool/autopilot: mark incompletely executed tests #5432

tool/autopilot: mark incompletely executed tests #5432

Comments

rite commented Jan 22, 2025 • edited Loading

nfeske commented Jan 30, 2025

nfeske commented Feb 4, 2025

rite commented Feb 5, 2025

nfeske commented Feb 5, 2025

rite commented Feb 11, 2025 • edited Loading

nfeske commented Feb 13, 2025

rite commented Feb 13, 2025

rite commented Jan 22, 2025 •

edited

Loading

rite commented Feb 11, 2025 •

edited

Loading