-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci(k8s): disable weekly triggers #9928
base: master
Are you sure you want to change the base?
ci(k8s): disable weekly triggers #9928
Conversation
The K8S CI jobs have been failing for months already. So, disable weekly K8S triggers to stop wasting resources/money. |
@mflendrich please advice if you want to stop those for now. |
@grzywin your opinion is needed here. As I discussed with @zimnx today, these tests have coverage (e.g. AWS) that we don't have elsewhere -- by leaving those tests either red or disabled we are missing important coverage. There are two options:
@grzywin - can you look into this and recommend whether we do (1) or (2)? |
Currently, I am not aware of anyone taking care of that job, even if we unbreak it, we need someone to monitor it frequently |
Is it possible to get failure alerts to #team-operator-ci on Slack? That is the team's workflow that would require zero proactive monitoring on top of what we do already. |
We won't be able to monitor these failures. QA team doesn't version their jobs and libraries, so any merged PR can break those. |
Thanks for bringing this up, sounds concerning to me. Let's see what it takes to unbreak the red job and have the monitoring discussion separately. |
@mflendrich In the past, we already had a discussion about the benefits of having tests in two different places (the SCT framework and the Scylla Operator project itself). The link to the meeting and the Excel file describing what we have in SCT can be found here. The conclusion was that many of the tests in SCT are already covered in Scylla Operator, and we don't want to maintain both. The important ones that are not covered in Scylla Operator are the Upgrade tests (which I trigger manually before each Operator release), and as you already mentioned, SCT also runs tests in AWS, which we don’t have in Scylla Operator. Taking the above into account, and considering the decision that I will be working on tests directly in the Scylla Operator project, I opt for option 2. To ensure AWS testing is not overlooked, after closing this issue:
Either way, I am approving this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Yes, actively developed project has such side-effects that new changes may break some stuff. JFYI: guys who test scylla-manager do use SCT test coverage, they didn't drop it after ownership switch.
It doesn't tell anything about the value and complexity of redoing elsewhere.
High load? Are these covered in operator tests?
Upgrade tests SCT has are of the following 3 types:
It is not enough to trigger for release only, need to keep a finger on the pulse of it's workability.
Dropping SCT tests the existing scylla-operator test coverage gets cut significantly from the quality perspective point of view. This particular PR is only about making sense for now - do not waste money while it is not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, based on the context above, operating under an assumption that these tests provide substantial value that we shouldn't forgo: I think we should stick with the original plan of keeping this coverage as-is, only unbreaking now [1] vs later [2].
@grzywin can you assess the amount of work required to unbreak the job (as we discussed yesterday)?
@mflendrich I will take a look on those failing jobs today and get back to you. @ylebi As a side note, we should keep in mind that I am the only tester working on the Operator right now. As a result, in future it might be difficult to keep the SCT tests stable at all times while simultaneously working on tests in Go/scylla-operator repo. |
We should have this conversation next week - as per my earlier comment to figure out how to spend our resources best. To me it looks like a planning exercise we'll need to conduct (whether building new coverage should come at the cost of sacrificing existing coverage in the SCTs). Let's now focus on the technicalities of what it takes to fix these jobs now. After the weekend I'll start a discussion and we'll decide: which existing tests we want to invest to keep supporting, what we want to drop, and where we'll be adding new coverage. I hope that this approach can bring some clarity and address everyone's concerns here. |
Testing
PR pre-checks (self review)
backport
labelsReminders
sdcm/sct_config.py
)unit-test/
folder)