Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-triggering failed gradle build should run only failed tests #5010

Open
sachinpkale opened this issue Sep 10, 2024 · 5 comments
Open

Re-triggering failed gradle build should run only failed tests #5010

sachinpkale opened this issue Sep 10, 2024 · 5 comments
Assignees
Labels
enhancement New Enhancement

Comments

@sachinpkale
Copy link
Member

Is your feature request related to a problem? Please describe

  • Currently, majority of the build failures for OpenSearch repo are due to flaky tests.
  • Re-triggering the build runs all the tests again (40+ mins) and it is possible that next time build fails due to another flaky test.
  • This impacts contributor's productivity and slows down the feature work.
  • Build re-trigger also adds extra stress on the build infra and this gets amplified as we get closer to code freeze date for a release.

Describe the solution you'd like

  • If a build fails due to test failure, re-triggering the build should run only the failed tests from the previous run.
  • It would be safe to re-run only failed tests. If the change in PR is actually causing a test to fail, it would always fail.
  • This will help in reducing the build re-trigger time considerably.
  • The incremental test run should only be applicable to re-trigger of failed build. Any new change pushed to PR should run the entire build.

Describe alternatives you've considered

  • Alternative solution to the mentioned issue is to bring down number of flaky tests to 0.
  • Even though we continuously try to reduce flaky tests, with each new feature, new tests are added and it is possible that new flaky tests are introduced (this happens even with multiple runs of tests on local setup)
  • Tests with random wait (or assertBusy) are more susceptible to being flaky as most of the these tests do not show any symptom on local and fail when running on GitHub build system (mostly due to overloaded build servers).
  • We will continue our efforts to reduce flaky tests, but aiming for 0 flaky tests may take months and does not guarantee that a new flaky test will not be introduced.

Additional context

No response

@sachinpkale sachinpkale added enhancement New Enhancement untriaged Issues that have not yet been triaged labels Sep 10, 2024
@sachinpkale sachinpkale changed the title Re-triggering failed gradle check should run only failed tests Re-triggering failed gradle build should run only failed tests Sep 10, 2024
@rishabh6788
Copy link
Collaborator

rishabh6788 commented Sep 10, 2024

Thanks for creating this issue @sachinpkale, I was thinking about creating a similar issue in OpenSearch repo to gather feedback from all the contributors of OS repo on how to tackle this problem.
While refactoring gradle-check and break it into parallel ci runs seems to be most optimal solution in the long run, we do need some mechanism to reduce the churn due to flakey test failures.

The solution I was thinking about is as follows:

Approach 1

The solution could be we catch the failure status in the existing gradle-check yml workflow file and add new steps to only retry the failing tests, if the retry passes the gradle-check would show success. This will not require any custom action from developer side.
This will not help in genuine test failures and will extend gradle-check execution time.

Approach 2

  1. gradle-check fails on the PR.
  2. The developer determines whether it is genuine failure or flakey test failures.
  3. If flakey test, the dev adds a comment or label that says re-run failed tests or something similar.
  4. This triggers a workflow (combination of GHA and Jenkins) which gets failing test list from the last failed gradle-check run.
  5. Runs those tests only and publishes result back on the PR.

There will be other checks like if a new commit was published which caused a new gradle-check run and then a comment/label was added then it wouldn't run stating that a gradle-check is already in progress.

But before we go ahead and start talking about solutions, I have a question on what happens to the last failed gradle-check ci rule on the PR. For now a passing gradle-check is a must for the PR to be merged.
In the solution proposed, even if we re-run the failing tests and update on the PR that they passed, the status of gradle-check would still show failed.

Do we then relax this rule? Not required if we go with approach 1.

@reta @getsaurabh02 @prudhvigodithi @gaiksaya @dblock @peterzhuamazon Thoughts?
Both approaches are in theory and need to be tested for feasibility.
Would like to hear any other ideas to tackle this problem more efficiently.

@shiv0408
Copy link
Member

+1 for breaking the gradle-check into multiple parallel ci runs

@reta
Copy link
Contributor

reta commented Sep 10, 2024

Thanks @sachinpkale @rishabh6788 , I would also agree with @shiv0408 that breaking the Gradle check into separate tasks / phases (which could be run in parallel and retried individually) looks like a good first step, if I remember correctly, @andrross also brought this idea some time ago. Unrelated to flaky tests, it would also help with getting better test coverage reports.

If a build fails due to test failure, re-triggering the build should run only the failed tests from the previous run.

This will not work as of today (monolithic Gradle check), for example when there are failures in unit tests in any module, the build fails right away (example here [1]):

* What went wrong:
Execution failed for task ':modules:reindex:test'.
> There were failing tests. See the report at: file:///var/jenkins/workspace/gradle-check/search/modules/reindex/build/reports/tests/test/index.html

Retrying such tests would not be useful since the majority of tests weren't even run. To reliable implement such feature we need to make sure none of the test related tasks / phases were skipped.

[1] https://build.ci.opensearch.org/job/gradle-check/47638/console

@prudhvigodithi
Copy link
Collaborator

Thanks @rishabh6788 @reta @sachinpkale @shiv0408, here are some issues from past to Optimize the Gradle check
opensearch-project/OpenSearch#1975
opensearch-project/OpenSearch#4053
opensearch-project/OpenSearch#12410
opensearch-project/OpenSearch#2496
#4810
#1572

I would vote for breaking the Gradle check into separate tasks / phases which will improve the developer productivity and eventually improve the Core contributions.

I'm also ok with other approaches to run in incremental or just re-run the failed Gradle tests, but again with Gradle task graph dependencies we might eventually trigger more tests that just the targeted tests part of re-try.

As one quick fix we can update the gradle-check workflow to just run for the latest head commit and cancel all the other running jobs for the same PR, this will reduce some noise on the PR (with failing gradle check comments) and stress on infra (by cancelling the long running old commit gradle runs).

Once we split the gradle check and optimize it, down the lane (or in parallel) we can tackle the existing Flaky Tests and take a call if we have to mute them for short term and make them the entry criteria for the upcoming release and get them fixed. Some more thoughts here #4810 (comment).

@cwperks You may be interested in this conversation as well.

@rishabh6788
Copy link
Collaborator

As one quick fix we can update the gradle-check workflow to just run for the latest head commit and cancel all the other running jobs for the same PR, this will reduce some noise on the PR (with failing gradle check comments) and stress on infra (by cancelling the long running old commit gradle runs).

I have created an issue for the same to discuss potential solutions in #5008. @prudhvigodithi @reta appreciate your feedback and comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New Enhancement
Projects
Status: Backlog
Development

No branches or pull requests

5 participants