-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the test run time on PRs #3385
Comments
cc @metacosm |
This should be worked on as quickly as possible, in my opinion as it would help speed up the rest of the development. |
@metacosm This deserves a long description, so that we can analyze the requirement and ofcourse we need to check it (if complicated) with test platform team as our pr run is controlled by the prow bot.
Just want to understand, what is the motive here ? Test time optimization ? it won't help much because CI takes almost the same time for spinning up and running test against each version of cluster. If we are looking for AWS resource optimization then yes, number of AWS resource will be reduced.
+1, but we will keep both test type until we make sure that our test does not hit flakes. |
The motivation is to move faster PRs faster from submitted to merged as well as reduce resources usage. Right now, lots of time/resources is spent on running tests that are not needed: integration tests should only be run when a PR is approved and considered ready to be merged by reviewers. There is no point in running all the tests while the review process is not finished and only unit tests / smoke tests should be run during that period. This would have the following benefits:
Another aspect of this is obviously the unbelievably high occurrence of flakes, which should also be addressed. If a test cannot reliably run then it is useless because if it fails, you don't know if it's positive signal of an issue with the code and if it passes, you're not sure it somehow worked when maybe it shouldn't have… |
@metacosm So basically we are looking for flow like I need to check with platform team if this is possible with pre submit test type that we are using, otherwise we explore other possibilities. |
It is highly unlikely that such a workflow will work. Better to confirm with test platform team Assuming it does not, we could also look to split minimum tests (ut + int) into presubmit and have e2e or maybe even all run as post submit. Doing so might also allow us to reliably increase the delay in periodic tests as well Also how about removing all but the latest and latest-1 version of cluster tests from pr tests and have the other versions in periodic |
No it is not possible to have tests that only get triggered after approval and are a merge requirement. I personally also don't think it makes sense, because you are wasting your reviewers time by having them review something for which CI can tell you that its not going to work. |
Well I guess we are going with what I recommended then, splitting the tests across presubmit, postsubmit and periodic. wdyt @alvaroaleman |
Yeah, sounds like a good idea, you just need to make sure that someone looks at them and acts if there are failures, otherwise its pretty much useless |
@alvaroaleman a review should not be contingent on test results so I don't see that as a waste of the reviewer's time. The converse is also true: it doesn't make sense to test something that will be changed because of reviews that's why, imo, only unit tests should always be run before a PR is approved. |
Tests have little associated costs, humans have very high associated cost so running a couple more tests if that saves human time is always worth it. Happy for you if you have the capacity to do so many reviews, I rarely look at PRs that have failing tests. It would be conceptionally very hard to add something like this to Prow and it opens up a nice set of failure modes (how to handle test failures on approved PRs? We currently assume they are flakes and will retest forever). |
While I agree with you in principle, that's just not true for odo. The integration tests take several hours to complete and are highly unreliable (lots of flakes) so if you're waiting for tests to pass to perform a review, you will never review anything… :) I'm not saying that this should be added to Prow, just saying that the odo tests should be set up in such a way that only unit tests (which should cover enough to get a good idea of the soundness of the PR) are run until the PR is approved. If integration tests fail, then the onus is on the author to get them fixed, possibly asking for another review afterwards if needed. Finally, failing tests pollute the review with lots of useless messages making the review process harder than it needs to be. |
Nice little discussion there @metacosm @alvaroaleman. I will need to think about this before putting my points :) However, there is another way we of splitting testing that could improve testing time while avoiding invisibility to an extent. First, we should probably remove all but latest and latest-1 openshift clusters from pre submit with complete testing relegated to post submit and periodic. One we have done that we will free up a bunch of clusters we use which could then be used to split tests. For eg we can do int-1 and int-2 which will run half and half of the integration tests for both versions. So we can end up with 1 test that does unit and then 2 for integration and 1 for e2e per cluster. We can split more of course depending on size of tests and resource availability. Wdyt? I think the cluster provision time is a constant that will happen once across the board. That aside the fact does remain that the cleanest way to increase test speed here is to - eliminate provision unless you are testing something that needs it, but that will likely need cluster pooling of some sort which comes with its own associated costs. The fact remains that the biggest time consuming part is the cluster provision witch takes atleast 40-45 mins on good days. |
+1, this same ideas has been discussed in one of the weekly standup call for reducing the number of cluster and will use the same test type as it is. @mohammedzee1000 But for pr testing post submit job is not the best fit IMO because it will unnecessarily increase the effort. For example all the time there will be a fear of broken master code and it will increase the consistent monitoring cost on master after each pr we merge through periodic job.
yes, we can go with this. It will reduce the test time upto 50%.
It's a long term goal, may be we need to create an separate jira to track this. |
I am creating the task list
|
+1 I would even consider just using the latest release
-1 this makes me a little bit worried I would rather run a full test suite on each cluster. There should be one additional point:
|
I would also add: investigate and fix or remove unstable tests (flakes) :) |
@amitkrout @kadel I did not mean that. I meant doing all the tests on every version that we test but splitting them as instances like test1-4.5 runs 1 half and test2-4.5 runs second half for example
|
@mohammedzee1000 do you mean we need two 4.5 cluster to in CI, right ? |
Yes one running one half and other running second half. Don't worry we will get some clusters back by not testing a version or 2. While this may not balance things out, it will still be an overall win We could do for eg 4.5 and 4.4 or even just 4.5 in regular pr ci But split tests for what we test, thereby achieving a semblance of parallelism. Remember cluster provision will be constant time |
I still think that integration tests should only be run after the PR is approved by default, though it should be possibly to manually trigger such a run if needed on a case by case basis. |
I am lost on what was the ask and what are we heading to @stevekuznetsov The initial ask was we don't want to run our CI after the pr is triggered but CI can run only if |
@stevekuznetsov @amitkrout @mohammedzee1000 Can you please share the Prow plugin documentation. |
This was done in #2931 👍
To be honest I don't think that this part is that important right now. /priority low |
Executing only the latest version of openshift CI in the PR and rest in periodic jobs are done with this issue itself PR - openshift/release#10312 and openshift/release#10346 |
Test clean up still in progress and working on it individually. Tests covered till now
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
we are considering running the e2e tests ( not integration tests ) in the periodic jobs and omit them from the PR tests altogether- this was done in Run periodic job on OpenShift CI #2931The text was updated successfully, but these errors were encountered: