Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NE-585: Make ROUTER_BACKEND_CHECK_INTERVAL configurable #712

Merged

Conversation

candita
Copy link
Contributor

@candita candita commented Mar 2, 2022

NE-585 Expose and make configurable ROUTER_BACKEND_CHECK_INTERVAL to customize the length of time between subsequent liveness checks on backends. Details in the enhancement proposal

Refactor some unit tests and add e2e test.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2022
@candita candita force-pushed the NE-585-HealthCheckInterval branch from eeb2ed5 to 7111096 Compare March 2, 2022 19:28
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2022
@candita
Copy link
Contributor Author

candita commented Mar 2, 2022

failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: exit status 1

/retest

@candita candita force-pushed the NE-585-HealthCheckInterval branch 2 times, most recently from da7698d to 8928044 Compare March 4, 2022 16:56
@candita
Copy link
Contributor Author

candita commented Mar 4, 2022

/retest

1 similar comment
@candita
Copy link
Contributor Author

candita commented Mar 7, 2022

/retest

@candita
Copy link
Contributor Author

candita commented Mar 8, 2022

/test e2e-aws

@candita
Copy link
Contributor Author

candita commented Mar 9, 2022

/retest

@candita
Copy link
Contributor Author

candita commented Mar 10, 2022

The e2e test for componentRoutes could use tweaking if this comes up frequently:
=== RUN TestConfigurableRouteNoConsumingUserNoRBAC
configurable_route_test.go:301: failed to restore cluster ingress resource to original state: Operation cannot be fulfilled on ingresses.config.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
configurable_route_test.go:305: failed to restore cluster ingress resource to original state: Operation cannot be fulfilled on ingresses.config.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
--- FAIL: TestConfigurableRouteNoConsumingUserNoRBAC (1.28s)

/retest

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2022
@candita
Copy link
Contributor Author

candita commented Mar 16, 2022

/retest

@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2022
@candita
Copy link
Contributor Author

candita commented Mar 17, 2022

Interesting failure:

Mar 17 03:18:32.343: FAIL: ns/openshift-etcd pod etcd-quorum-guard-6cddbc8997-mrxld and pod etcd-quorum-guard-6cddbc8997-8mkx4 are running on the same node: ci-op-5zwyg8np-6da5c-9tmhw-master-2

/retest

@candita
Copy link
Contributor Author

candita commented Mar 21, 2022

/retest

1 similar comment
@candita
Copy link
Contributor Author

candita commented Mar 21, 2022

/retest

@candita candita force-pushed the NE-585-HealthCheckInterval branch 2 times, most recently from c306d56 to b35f5c4 Compare March 25, 2022 23:14
@candita candita changed the title [WIP] NE-585 Make ROUTER_BACKEND_CHECK_INTERVAL configurable NE-585 Make ROUTER_BACKEND_CHECK_INTERVAL configurable Mar 25, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 25, 2022
@candita
Copy link
Contributor Author

candita commented Mar 30, 2022

could not run steps: step e2e-aws failed: failed to acquire lease for "aws-quota-slice": resources not found
/retest

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 2, 2022
@candita
Copy link
Contributor Author

candita commented Apr 19, 2022

/retest

@candita
Copy link
Contributor Author

candita commented Apr 19, 2022

=== RUN TestHostNetworkEndpointPublishingStrategy
operator_test.go:756: failed to observe expected conditions: timed out waiting for the condition
operator_test.go:758: deleted ingresscontroller host
--- FAIL: TestHostNetworkEndpointPublishingStrategy (300.04s)
/retest

@candita
Copy link
Contributor Author

candita commented Apr 19, 2022

evel=error msg=Error: Error waiting for instance (i-0c49e1870a72f8c20) to become ready: timeout while waiting for state to become 'running' (last state: 'pending', timeout: 10m0s)
level=error
level=error msg= with module.masters.aws_instance.master[2],
level=error msg= on master/main.tf line 129, in resource "aws_instance" "master":
level=error msg= 129: resource "aws_instance" "master" {

/retest

@candita
Copy link
Contributor Author

candita commented Apr 19, 2022

Again:

=== RUN TestHostNetworkEndpointPublishingStrategy
operator_test.go:756: failed to observe expected conditions: timed out waiting for the condition
operator_test.go:758: deleted ingresscontroller host
--- FAIL: TestHostNetworkEndpointPublishingStrategy (300.44s)

/retest

@Miciah
Copy link
Contributor

Miciah commented Apr 19, 2022

Again:

=== RUN TestHostNetworkEndpointPublishingStrategy
operator_test.go:756: failed to observe expected conditions: timed out waiting for the condition
operator_test.go:758: deleted ingresscontroller host
--- FAIL: TestHostNetworkEndpointPublishingStrategy (300.44s)

/retest

Try bumping openshift/api to openshift/api@716dc56 or newer.

Comment on lines 72 to 75
// when set to an unacceptable value, it should not be accepted
if err := waitForDeploymentEnvVar(t, kclient, routerDeployment, deploymentTimeout, ingresscontroller.RouterBackendCheckInterval, (0 * time.Second).String()); err == nil {
t.Fatalf("unexpected router deployment %s set at %v: %v", ingresscontroller.RouterBackendCheckInterval, 2*defaultHealthCheckInterval, err)
}

// when set to an unacceptable value, the env variable should not exist
if err := waitForDeploymentEnvVar(t, kclient, routerDeployment, deploymentTimeout, ingresscontroller.RouterBackendCheckInterval, ""); err != nil {
t.Fatalf("expected router deployment to remove %s: %v", ingresscontroller.RouterBackendCheckInterval, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are both waitForDeploymentEnvVar calls necessary? It seems like the second one (make sure the environment variable is absent) would suffice, and the first one (make sure the environment variable isn't set to an unexpected value) is superfluous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@candita candita force-pushed the NE-585-HealthCheckInterval branch 2 times, most recently from c2d53f2 to 6c51354 Compare April 20, 2022 21:58
@candita
Copy link
Contributor Author

candita commented Apr 21, 2022

=== RUN TestConfigurableRouteNoConsumingUserNoRBAC
configurable_route_test.go:301: failed to restore cluster ingress resource to original state: Operation cannot be fulfilled on ingresses.config.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
configurable_route_test.go:305: failed to restore cluster ingress resource to original state: Operation cannot be fulfilled on ingresses.config.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
--- FAIL: TestConfigurableRouteNoConsumingUserNoRBAC (0.97s)

/test e2e-aws-operator

@candita candita force-pushed the NE-585-HealthCheckInterval branch 3 times, most recently from bf72dde to 24de4a9 Compare April 21, 2022 21:10
@Miciah
Copy link
Contributor

Miciah commented Apr 21, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 21, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: candita, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ShudiLi
Copy link
Member

ShudiLi commented Apr 22, 2022

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Apr 22, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@Miciah
Copy link
Contributor

Miciah commented Apr 22, 2022

 === RUN   TestRouterCompressionParsing
    router_compression_test.go:101: failed to update ingress controller, retrying...
    router_compression_test.go:40: deleted ingresscontroller http-compression-1
    router_compression_test.go:101: failed to update ingress controller, retrying...
    router_compression_test.go:41: deleted ingresscontroller http-compression-2
    router_compression_test.go:101: failed to update ingress controller, retrying...
    router_compression_test.go:42: error testing compression policy that needs quotes: failed to update ingress controller: Put "https://api.ci-op-p2cm6vv3-43abb.origin-ci-int-aws.dev.rhcloud.com:6443/apis/operator.openshift.io/v1/namespaces/openshift-ingress-operator/ingresscontrollers/http-compression-3": read tcp 10.131.191.98:56832->54.241.76.165:6443: read: connection reset by peer
    router_compression_test.go:42: deleted ingresscontroller http-compression-3
    router_compression_test.go:101: failed to update ingress controller, retrying...
    router_compression_test.go:53: deleted ingresscontroller http-compression-4
--- FAIL: TestRouterCompressionParsing (94.68s) 

We might need to add a retry around the update in that test.
/test e2e-aws-operator

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 22, 2022

@candita: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants