Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for running bisection jobs in the pipeline #711

Open
pawiecz opened this issue Jul 22, 2024 · 7 comments
Open

Add support for running bisection jobs in the pipeline #711

pawiecz opened this issue Jul 22, 2024 · 7 comments

Comments

@pawiecz
Copy link
Contributor

pawiecz commented Jul 22, 2024

With initial draft described in kernelci/kernelci-core#2594 (comment) and the custom API endpoints introduced in #691 (as well as checkouts #590) Maestro should be able to run bisection jobs in its pipelines.

This task focuses on integrating these features as well as potential external efforts in this matter.

@broonie
Copy link
Member

broonie commented Jul 22, 2024

Suggested setup: service running bisects keeps a bare checkout for each bisect (can use local clones from a master repo to give only a tiny disk space overhead). When setting up it can use 'git rev-list working..broken' to get a list of commits covered by the bisect and pull any existing results from the database to narrow down the bisect further.

Then my idea for a bisect step was to check for a build, if not ask for a new one. Once binaries are available a test job can be generated - if we keep track of test jobs well enough then we can share jobs between multiple bisects so if multiple tests in a single test job (eg, a single kselftest suite) fail then we can bisect them in parallel and share jobs up until the point where the bisects diverge (if they do). The job sharing can be really helpful when multiple tests break.

@padovan
Copy link

padovan commented Jul 23, 2024

@broonie that is interesting, but the job sharing feels a bit more elaborated for a first MVP. Another idea in that direction is that we could try to find a build that is close enough to the current bisection step, eg we have a build for 5 commits ahead our step, so we can use that right away instead. I hope git bisect can cope with that as well.

@broonie
Copy link
Member

broonie commented Jul 23, 2024

I would've thought that about the job sharing too - it came up when I was implementing my own bisection because I realised I was generating stable names for the jobs based on the commit IDs of the commits being tested so I could just shove them in a simple table and look there to see if the job already existed before resubmitting it.

Nearby commits are also interesting yeah - if you've got something event based you can just feed in any result that gets delivered for a commit covered by the bisect. The only risk there is that it makes the bisect log look less clean, it shouldn't impact the actual result though. You could potentially do something like check if there are more than N jobs scheduled that will report results for the test and suppress generation for new ones until those come in, though some might be for other bisects going down different branches.

@padovan
Copy link

padovan commented Jul 23, 2024

@pawiecz it would be good scope the work and prepare a little roadmap, as it seems we will tacke this in steps.

@pawiecz
Copy link
Contributor Author

pawiecz commented Jul 24, 2024

Indeed, a few steps:

Basic use case

As described by @broonie in the first paragraph and beginning of the second.

Optimization 1: Find a build that is close enough to the current bisection step

Why? Reuse already available test results instead of submitting new jobs and waiting for data
How?

Step A: Having the rev-list the neighborhood thresholds could be set to decide whether there is relevant data already available or if there is a need to submit new jobs.

Step B: Listen on the events in the Maestro and filter relevant ones ("any result that gets delivered for a commit covered by the bisect").

Optimization 2: Rendering shared TestJobs for running multiple bisections in parallel

Why? Because test execution takes significantly less time than DUT setup (deployment, provisioning)
How? Combine test suites/cases into single TestJob definition: Action block of the job template already supports that. Reducing them as "the bisects diverge (if they do)" will still have to be implemented.

Optimization 3: Suppress submitting new jobs until results come in if more than N jobs have already been scheduled

Why? To create a job results cache first instead of doubling submissions
How? Implement queuing mechanism in the bisection service keeping in mind "some [TestJobs] might be for other bisects going down different branches"

Note: I reordered optimizations a bit and I'm not sure if (1B) in fact should take priority over (2) - task order might change in development.

@broonie
Copy link
Member

broonie commented Jul 24, 2024

The main thing I was thinking about with optimisation 2 is that it's really common for one bug to cause multiple tests in the same suite to fail - that'd trigger bisects for each test that fails, but if it's one underlying bug they'll all come up with the same answer and can share all their jobs. Even if it ends up as multiple commits there's a fair to middling chance they'll be somewhere nearby in the bisection (eg, for -next in the same tree) so it'll help for a lot of the bisection.

@pawiecz
Copy link
Contributor Author

pawiecz commented Jul 25, 2024

Oh, I see where I misunderstood your point on (2) and I get the reasons for (3) more clearly.

With current level of granularity retriggered tests would be:

@broonie Do you think combining even more Actions into single TestJob might potentially cause interference between test cases and therefore not be worth the setup time savings? Probably my take on (2) could have lower priority (1A > 3 > 1B > 2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants