Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run periodic job on OpenShift CI #2931

Closed
amitkrout opened this issue Apr 20, 2020 · 40 comments
Closed

Run periodic job on OpenShift CI #2931

amitkrout opened this issue Apr 20, 2020 · 40 comments
Assignees
Labels
area/testing Issues or PRs related to testing, Quality Assurance or Quality Engineering estimated-size/M (10-20) Rough sizing for Epics. About 1 sprint of work for one person

Comments

@amitkrout
Copy link
Contributor

amitkrout commented Apr 20, 2020

/kind feature

Which functionality do you think we should add?

Currently we have high volume of test running for each pr which consumes almost 3hrs. So the plan here is to distribute the test in different types run. For example in nightly run all the test script will run periodically on master branch over a period of time which we should not bother about. The pr test will be validated against integration test only.

Also there is one more benefit to add the nightly run on master branch is to catch the flake if there is any to avoid unpleasant experience while running pr test.

@openshift-ci-robot openshift-ci-robot added the kind/feature Categorizes issue as a feature request. For PRs, that means that the PR is the implementation label Apr 20, 2020
@girishramnani girishramnani added area/release-eng Issues or PRs related to the Release Engineering area/testing Issues or PRs related to testing, Quality Assurance or Quality Engineering labels Apr 20, 2020
@kadel
Copy link
Member

kadel commented Apr 20, 2020

/remove-area release-eng
/remove-kind feature
/kind test
/triage needs-information

What kind of jobs or test will run nightly?

@openshift-ci-robot
Copy link
Collaborator

@kadel: The label(s) triage/needs-informion cannot be applied, because the repository doesn't have them

In response to this:

/remove-area release-eng
/triage needs-informion

What kind of jobs or test will run nightly?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added kind/test triage/needs-information Indicates an issue needs more information in order to work on it. and removed area/release-eng Issues or PRs related to the Release Engineering labels Apr 20, 2020
@openshift-ci-robot
Copy link
Collaborator

@kadel: Those labels are not set on the issue: area/release-eng

In response to this:

/remove-area release-eng
/remove-kind feature
/kind test
/triage needs-information

What kind of jobs or test will run nightly?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the kind/feature Categorizes issue as a feature request. For PRs, that means that the PR is the implementation label Apr 20, 2020
@openshift-ci-robot openshift-ci-robot added the kind/feature Categorizes issue as a feature request. For PRs, that means that the PR is the implementation label Apr 22, 2020
@openshift-ci-robot
Copy link
Collaborator

@kadel: Those labels are not set on the issue: area/release-eng

In response to this:

/remove-area release-eng
/remove-kind feature
/kind test
/triage needs-information

What kind of jobs or test will run nightly?

Updated the description

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the kind/feature Categorizes issue as a feature request. For PRs, that means that the PR is the implementation label Apr 22, 2020
@amitkrout
Copy link
Contributor Author

/remove-area release-eng
/remove-kind feature
/kind test
/triage needs-information

What kind of jobs or test will run nightly?

Updated the description

@amitkrout amitkrout added triage/ready and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Apr 22, 2020
@openshift-ci-robot openshift-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Apr 22, 2020
@openshift-ci-robot
Copy link
Collaborator

@kadel: Those labels are not set on the issue: area/release-eng, kind/feature

In response to this:

/remove-area release-eng
/remove-kind feature
/kind test
/triage needs-information

What kind of jobs or test will run nightly?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@girishramnani girishramnani removed the triage/needs-information Indicates an issue needs more information in order to work on it. label Apr 22, 2020
@mohammedzee1000
Copy link
Contributor

Template: https://steps.svc.ci.openshift.org/help/ci-operator#periodic

@girishramnani girishramnani added the estimated-size/M (10-20) Rough sizing for Epics. About 1 sprint of work for one person label Apr 22, 2020
@amitkrout
Copy link
Contributor Author

amitkrout commented May 3, 2020

Task need to be done

@kadel
Copy link
Member

kadel commented May 4, 2020

How are we going to handle the situations when nightly job fails?
Is there going to be email notification when that happens? Who is going to be responsible for fixing it?

@amitkrout
Copy link
Contributor Author

amitkrout commented May 5, 2020

How are we going to handle the situations when nightly job fails? Is there going to be email notification when that happens?

@kadel We will follow the regular process like we have been doing, file the issue on master and fix it. BTW this is a duration gap of 6hrs periodic job, i mean in every 6 hrs the test (integration+e2e) job will be triggered. Ideally there should be an option of email notification in the job configuration. I need to figure out the way of only getting the e mail notification when the job fails otherwise no email notification is needed. Till then we need to consistently monitor the test results at - https://prow.svc.ci.openshift.org/?repo=openshift%2Fodo

EDIT: As per my slack communication with platform team - OpenShift CI does not support email notifications but they do support slack alerts for now. We can opt for slack alerts then.

Who is going to be responsible for fixing it?

Once the issue is figured out, the package owner will be the first choice for fixing the issue or anybody can volunteer and fix it.

@amitkrout amitkrout changed the title Run nightly job on OpenShift CI Run periodic job on OpenShift CI May 5, 2020
@dharmit
Copy link
Member

dharmit commented Jun 22, 2020

Moved this to In Progress as there's nothing really to review. Amit's working on the final bits about sending notifications to slacks.

@girishramnani
Copy link
Contributor

adding a link in the docs and notification to slack channel is left

@kadel
Copy link
Member

kadel commented Jul 1, 2020

adding a link in the docs and notification to slack channel is left

Where we are on that? Is this getting blocked by something?

@amitkrout
Copy link
Contributor Author

amitkrout commented Jul 7, 2020

adding a link in the docs and notification to slack channel is left

Where we are on that? Is this getting blocked by something?

@kadel Yes we are blocked by kubernetes/test-infra#18190. Prow does support notification on multiple slack namespace from a single instance of prow. Currently we have prow available for coreOS slack namespace, so the notification feature will not be supported for https://openshiftdo.slack.com namespace. Anyway we can track the feature request in the issue kubernetes/test-infra#18190

As part of adding the link is concerned i can open up a pr at any moment.

@kadel
Copy link
Member

kadel commented Jul 8, 2020

@kadel Yes we are blocked by kubernetes/test-infra#18190. Prow does support notification on multiple slack namespace from a single instance of prow. Currently we have prow available for coreOS slack namespace, so the notification feature will not be supported for https://openshiftdo.slack.com namespace. Anyway we can track the feature request in the issue kubernetes/test-infra#18190

Can we at least setup notification for our channel in coreos slack?
Without notifications the periodic jobs are useless :-(

@amitkrout
Copy link
Contributor Author

@kadel Yes we are blocked by kubernetes/test-infra#18190. Prow does support notification on multiple slack namespace from a single instance of prow. Currently we have prow available for coreOS slack namespace, so the notification feature will not be supported for https://openshiftdo.slack.com namespace. Anyway we can track the feature request in the issue kubernetes/test-infra#18190

Can we at least setup notification for our channel in coreos slack?
Without notifications the periodic jobs are useless :-(

Yes, this is doable. I will send a pr.

@amitkrout
Copy link
Contributor Author

I have done the changes through the pr openshift/release#10191 to send the periodic job notification to channel odo-notifications that i configured with.

reporter_config:
       slack:
               channel: '#odo-notifications'

but did not work as expected. No notification sent to #odo-notifications along with all periodic job fails with error - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/10191/rehearse-10191-periodic-ci-openshift-odo-master-v4.2-integration-e2e-periodics/1288013745720659968#1:build-log.txt%3A7

I can not figure out the potential fix from the error statement. I asked it on #forum-testplatform for a resolution.

@kadel
Copy link
Member

kadel commented Aug 7, 2020

This is not done.
The test is still falling, there not a single run where would all periodic jobs succeded.

@amitkrout
Copy link
Contributor Author

amitkrout commented Aug 12, 2020

This is not done.
The test is still falling, there not a single run where would all periodic jobs succeded.

Yes, its due to platform issue and known flakes. One major periodic test blocker is being worked on pr #3737.

@amitkrout
Copy link
Contributor Author

This is not done.
The test is still falling, there not a single run where would all periodic jobs succeded.

The jobs are failing due to platform issue or flakes. However we have few successful job run

Screen Shot 2020-08-21 at 1 34 59 AM

@amitkrout
Copy link
Contributor Author

/close

@openshift-ci-robot
Copy link
Collaborator

@amitkrout: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kadel
Copy link
Member

kadel commented Aug 25, 2020

Are we tracking those issues somewhere?
There was not a day where would all periodic jobs were successful.

In the current state, periodic jobs are of no help :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Issues or PRs related to testing, Quality Assurance or Quality Engineering estimated-size/M (10-20) Rough sizing for Epics. About 1 sprint of work for one person
Projects
None yet
Development

No branches or pull requests

6 participants