Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool/autopilot: mark incompletely executed tests #5432

Open
rite opened this issue Jan 22, 2025 · 7 comments
Open

tool/autopilot: mark incompletely executed tests #5432

rite opened this issue Jan 22, 2025 · 7 comments

Comments

@rite
Copy link
Contributor

rite commented Jan 22, 2025

As described in tool/autopilot, "The autopilot utility automates the process of executing multiple run scripts on different test targets". For exactly this purpose it is also used by Gapfruit since quite some time and helped us to simplify the definition of the continuous integration pipeline. This is a proposal to further streamline the execution of automated tests.

Let me start with an example:

$ tool/autopilot -t x86_64-linux-linux -p x86_64-pc -k nova -k hw \
                  -r nova -r lx_fs_import -r smbios_decoder -r init_smp -r framebuffer

The above command produces a test summary like:

--- architecture x86_64 ---
linux linux:                   nova                           -> UNAVAILABLE
pc hw:                         nova                           -> UNAVAILABLE
pc nova:                       nova                           -> OK (0:17)
linux linux:                   lx_fs_import                   -> OK (0:01)
pc hw:                         lx_fs_import                   -> OK (0:00)
pc nova:                       lx_fs_import                   -> OK (0:00)
linux linux:                   smbios_decoder                 -> OK (0:00)
pc hw:                         smbios_decoder                 -> OK (0:06)
pc nova:                       smbios_decoder                 -> OK (0:04)
linux linux:                   init_smp                       -> OK (0:02)
pc hw:                         init_smp                       -> OK (0:01)
pc nova:                       init_smp                       -> OK (0:01)
linux linux:                   framebuffer                    -> OK (0:05)
pc hw:                         framebuffer                    -> OK (0:09)
pc nova:                       framebuffer                    -> OK (0:07)
--- done (everything ok) ---

Let's have a look at the executed run scripts individually:

  • run/nova: Obviously, executing this run script on base-linux or base-hw doesn't make sense. Correspondingly, it is marked as "UNAVAILABLE".
  • run/lx_fs_import: Similarly, lx_fs_import is not supposed to work on base-hw and base-nova. However, it is marked as "OK". Inspecting the respective log files reveals that the "`Test requires 'linux'".
  • run/smbios_decoder: Again for obvious reasons, smbios_decoder is not supposed to work on base-linux, yet it is marked as "OK". Inspecting the respective log files reveals that the "Run script is only supported on hw/pc and nova/pc"
  • run/init_smp: Naively, I would assume this run script is supposed to work on base-hw and base-nova, but maybe not on base-linux. Inspecting the log files reveals that "Run script does not support Qemu". On real X86-hardware, the tests would be executed on base-nova but not on base-hw, because of the assert_spec nova-statement in the run script. The autopilot summary doesn't give any information in this regard: in any case, the execution of the run script is marked as "OK".
  • framebuffer: This run script does run_genode_until forever and is not supposed to be executed using the autopilot. Therefore it isn't in the list of automatically executed test cases (tool/autopilot.list).

One of the main goals of this proposal is that the autopilot summary includes the information whether a test has actually been executed or skipped/aborted. In order to archive that, run scripts have to declare their requirements explicitly and in a somewhat formalized/streamlined manner. As a side effect, this also simplifies to search (grep) for run scripts that – for example – aren't executed in autopilot-mode or won't be executed on a certain kernel. Further, this allows to pass all available run scripts to the autopilot and it's not necessary – while still possible – to manually curate a list of tests to be executed. The test pipeline can be defined in a way that newly added run scripts will automatically be executed, unless it is explicitly stated otherwise in the run script itself.

Without changing any run scripts, the test summary with commit d631b42 looks like:

--- architecture x86_64 ---
linux linux:                   nova                           -> UNAVAILABLE
pc hw:                         nova                           -> UNAVAILABLE
pc nova:                       nova                           -> OK (0:16)
linux linux:                   lx_fs_import                   -> OK (0:01)
pc hw:                         lx_fs_import                   -> ABORT (0:00): unsatisfied requirement
pc nova:                       lx_fs_import                   -> ABORT (0:00): unsatisfied requirement
linux linux:                   smbios_decoder                 -> OK (0:00)
pc hw:                         smbios_decoder                 -> OK (0:06)
pc nova:                       smbios_decoder                 -> OK (0:04)
linux linux:                   init_smp                       -> ABORT (0:02): unsatisfied requirement
pc hw:                         init_smp                       -> OK (0:01)
pc nova:                       init_smp                       -> OK (0:01)
linux linux:                   framebuffer                    -> OK (0:05)
pc hw:                         framebuffer                    -> OK (0:07)
pc nova:                       framebuffer                    -> OK (0:06)
--- done (everything ok) ---

Commit 0dd3bf1 introduces a "generalized" assert-procedure that can be used by a run script to express its requirements. The two remaining commits of this branch adapt run/framebuffer and run/smbios_decode to make use of this assert-procedure.

What do you think about the general direction of this proposal (apart from the actual implementation: for example naming the procedure assert feels "a bit too global")?

rite added a commit to rite/genode that referenced this issue Jan 22, 2025
...with "ABORT", in case of an unmet 'assert_spec' in the run scenario.

Issue genodelabs#5432
rite added a commit to rite/genode that referenced this issue Jan 22, 2025
rite added a commit to rite/genode that referenced this issue Jan 22, 2025
...to check if the requirements for the run scenario are satisfied.

By placing the assertion just before 'run_genode_until forever', the
automated tests still validate the 'init'-configuration and verify that
the components build successfully.

Issue genodelabs#5432
rite added a commit to rite/genode that referenced this issue Jan 22, 2025
...to check if the requirements for the run scenario are satisfied.

Issue genodelabs#5432
@nfeske
Copy link
Member

nfeske commented Jan 30, 2025

Thank you @rite for the excellent problem statement and the sensible suggestion for improving the tooling. Even though I haven't yet had a closer look at your commits, I very much appreciate the direction. It will certainly improve the quality of data that we at Genode Labs gather from our CI.

I'm looking forward to review your commits in detail once I have wrapped up my current line of work (#5428).

@nfeske
Copy link
Member

nfeske commented Feb 4, 2025

Now - after having looked at your commits - I like it even more. Let's go for it!

naming the procedure assert feels "a bit too global" [?]

The naming is just right. It is intuitive and void of ceremony.

What do you think of how we should go about updating the existing run scripts?

@rite
Copy link
Contributor Author

rite commented Feb 5, 2025

Thanks for the affirmative feedback.

I also like to rename the procedure get_cmd_switch to cmd_switch along with adding the procedure have_cmd_switch. I think this blends in nicely with the now omnipresent pattern with have_[spec|board|installed|include]. Further, the pretty common assert ![have_cmd_switch --autopilot] "feels better" than assert ![get_cmd_switch --autopilot]. Do you approve?

I suggest that I:

  1. (optionally) Provide a commit that introduces cmd_switch (a copy of get_cmd_switch) and have_cmd_switch.
  2. Provide commits that each updates the run scripts on a "per repo"-basis, including replacing calls of assert_spec with assert [have_spec].
  3. When these changes made it to master, external repos like genode-world, genode-imx, gapfruit-community, etc. may have to be updated accordingly.
  4. Provide a commit that removes the procedures assert_spec and get_cmd_switch from tool/run/run.

I expect that almost everything is trivial, but I miss the infrastructure to test all changes myself. And my focus is on testing with base-linux, base-nova, base-hw.

@nfeske
Copy link
Member

nfeske commented Feb 5, 2025

Do you approve?

I don't merely approve, but very much appreciate it!

Thanks for the generous plan, which is much more than I hoped for. Hats off for your willingness to adjust the run scripts. It goes without saying that we will do our share of testing and (potentially) fixing collateral effects that we'll observe in our CI.

rite added a commit to rite/genode that referenced this issue Feb 11, 2025
This procedure is useful for run scripts that depend for example on a
board support package that is provided only for certain boards, possibly
by a third-party repo.

Issue genodelabs#5432
rite added a commit to rite/genode that referenced this issue Feb 11, 2025
Renaming 'get_cmd_switch' and 'get_cmd_arg' to 'have_cmd_arg' and
'cmd_arg' respectively blends in nicely with the now omnipresent pattern
with 'have_[spec|board|installed|include]'.

This commit deprecates 'get_cmd_switch' and 'get_cmd_arg', which will be
removed in a later commit.

Issue genodelabs#5432
rite added a commit to rite/genode that referenced this issue Feb 11, 2025
...to be consistent with the naming in tool/run.

Related to genodelabs#5432
rite added a commit to rite/genode that referenced this issue Feb 11, 2025
...to be consistent with the naming in tool/run.

Related to genodelabs#5432
rite added a commit to rite/genode that referenced this issue Feb 11, 2025
...with "ABORT", in case of an unmet 'assert_spec' in the run scenario.

Issue genodelabs#5432
rite added a commit to rite/genode that referenced this issue Feb 11, 2025
rite added a commit to rite/genode that referenced this issue Feb 11, 2025
rite added a commit to rite/genode that referenced this issue Feb 11, 2025
rite added a commit to rite/genode that referenced this issue Feb 11, 2025
@rite
Copy link
Contributor Author

rite commented Feb 11, 2025

I rebased on current master and pushed a refined implementation of the proposed changes in "tool" to my feature branch, together with a commit that updates the run scripts in the repo "os".

I added some notes/questions as comments directly to the diff.

@nfeske
Copy link
Member

nfeske commented Feb 13, 2025

Splendid!

rite added a commit to rite/genode that referenced this issue Feb 13, 2025
rite added a commit to rite/genode that referenced this issue Feb 13, 2025
rite added a commit to rite/genode that referenced this issue Feb 13, 2025
@rite
Copy link
Contributor Author

rite commented Feb 13, 2025

Thank you! I pushed three fixup commits regarding your feedback. Some get_cmd_switch- and assert_spec-calls that slipped through initially are now also fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants