-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wait for catalogsource status ready before creating subscription #2601
wait for catalogsource status ready before creating subscription #2601
Conversation
Signed-off-by: akihikokuroda <akihikokuroda2020@gmail.com>
@@ -95,6 +95,8 @@ var _ = Describe("Subscription", func() { | |||
} | |||
|
|||
_, teardown = createInternalCatalogSource(ctx.Ctx().KubeClient(), ctx.Ctx().OperatorClient(), "test-catalog", generatedNamespace.GetName(), packages, crds, csvs) | |||
_, err := fetchCatalogSourceOnStatus(ctx.Ctx().OperatorClient(), "test-catalog", generatedNamespace.GetName(), catalogSourceRegistryPodSynced) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch - I was under the impression that we made a clean sweep of anywhere we instantiate a grpc-based CatalogSource, and then subsequently create a Subscription, but this one feels easy to catch given the setup isn't super readable. It would be nice to avoid having to hardcode the "test-catalog" in two places here, but I won't block the PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm somewhat second guessing this the more I think about it. I haven't played around with this locally, but looking at that test case failure output, it's not immediately clear to me why we need to simply wait for the CatalogSource to be reporting a "ready" state. Were you able to reproduce this test case failure locally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't reproduce this error locally but I saw this in the catalog-operator log of the CI e2e failure.
2022-01-19T20:00:24.250526731Z stderr F time="2022-01-19T20:00:24Z" level=debug msg="syncing catsrc" id=Zfz5K source=test-catalog
2022-01-19T20:00:24.250530131Z stderr F time="2022-01-19T20:00:24Z" level=debug msg="checking catsrc configmap state" id=Zfz5K source=test-catalog
2022-01-19T20:00:24.251445279Z stderr F time="2022-01-19T20:00:24Z" level=debug msg="check registry server healthy: true" id=Zfz5K source=test-catalog
2022-01-19T20:00:24.25145768Z stderr F time="2022-01-19T20:00:24Z" level=debug msg="registry state good" id=Zfz5K source=test-catalog
2022-01-19T20:00:28.931802007Z stderr F time="2022-01-19T20:00:28Z" level=debug msg="Got source event: grpc.SourceState{Key:registry.CatalogKey{Name:\"test-catalog\", Namespace:\"subscription-e2e-gcqhv\"}, State:3}"
2022-01-19T20:00:28.931816007Z stderr F time="2022-01-19T20:00:28Z" level=info msg="state.Key.Namespace=subscription-e2e-gcqhv state.Key.Name=test-catalog state.State=TRANSIENT_FAILURE"
2022-01-19T20:00:28.931824208Z stderr F time="2022-01-19T20:00:28Z" level=debug msg="syncing catsrc" id=j7VvG source=test-catalog
2022-01-19T20:00:28.931827808Z stderr F time="2022-01-19T20:00:28Z" level=debug msg="checking catsrc configmap state" id=j7VvG source=test-catalog
2022-01-19T20:00:28.939247402Z stderr F time="2022-01-19T20:00:28Z" level=debug msg="check registry server healthy: true" id=j7VvG source=test-catalog
2022-01-19T20:00:28.939260203Z stderr F time="2022-01-19T20:00:28Z" level=debug msg="registry state good" id=j7VvG source=test-catalog
2022-01-19T20:00:28.956912641Z stderr F I0119 20:00:28.955396 1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"subscription-e2e-gcqhv", UID:"dfe83254-ba15-438f-badf-dd3b79c12036", APIVersion:"v1", ResourceVersion:"815", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' [error using catalog test-catalog (in namespace subscription-e2e-gcqhv): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.96.185.140:50051: connect: connection refused", error using catalog operatorhubio-catalog (in namespace operator-lifecycle-manager): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.96.27.145:50051: connect: connection refused"]
2022-01-19T20:00:28.956949443Z stderr F I0119 20:00:28.956796 1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"subscription-e2e-gcqhv", UID:"dfe83254-ba15-438f-badf-dd3b79c12036", APIVersion:"v1", ResourceVersion:"815", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' [error using catalog test-catalog (in namespace subscription-e2e-gcqhv): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.96.185.140:50051: connec\
t: connection refused", error using catalog operatorhubio-catalog (in namespace operator-lifecycle-manager): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.96.27.145:50051: connect: connec\
tion refused"]
This shows that the latest gRPC status is TRANSIENT_FAILURE
but the status of the catalogsource is
check registry server healthy: true
and registry state good
.
Then the subscription is created and issue the list bundles
request and failed.
The catalogsource sync has checks if the pod of the registry is up, the resources for the registry (service, service accout, role, rolebinding, etc) are OK.
It also has the gRPC status separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I think that explanation sounds reasonable to me. In any case, this change is harmless so we can always re-open this issue if we misdiagnosed the root cause.
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: akihikokuroda, timflannagan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: akihikokuroda akihikokuroda2020@gmail.com
Description of the change:
The test doesn't want the catalogsource gRPC connection ready status before it creates the subscription for the catalogsource.
Motivation for the change:
Closes #2600
Reviewer Checklist
/doc