Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 server: revert non-standalone VW URL #2667

Merged
merged 1 commit into from
Jan 25, 2023

Conversation

hardys
Copy link

@hardys hardys commented Jan 23, 2023

Summary

Partially reverts logic introduced in #2407 so we don't unconditionally use the external URL - when non-standalone VW mode is used we still need the old behavior, or the loopback client can't access the non-standalone VW server.

This breaks shard bootstrapping (evidently not in our e2e scenarios, but in a real deployment where probes mean the proxy isn't ready until the shard bootstrap is complete)

I suspect further work will be required to make standalone VW mode work when deployed on a real cluster, since the ExternalAddress will go via the proxy, which depends on proxy bootstrapping (when the shard is deployed via a Service)

@openshift-ci openshift-ci bot requested review from ncdc and sttts January 23, 2023 18:58
@nrb
Copy link
Contributor

nrb commented Jan 23, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2023
@sttts
Copy link
Member

sttts commented Jan 24, 2023

/retest

@sttts
Copy link
Member

sttts commented Jan 24, 2023

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2023
@hardys
Copy link
Author

hardys commented Jan 24, 2023

/hold

e2e-sharded failure does look related e.g

W0124 10:26:12.676402     712 reflector.go:324] k8s.io/client-go@v0.0.0-20230113194502-8f8fb5fa4eea/tools/cache/reflector.go:167: failed to list *v1alpha1.LogicalCluster: Get "https://10.129.28.8:7445/services/initializingworkspaces/system:apibindings/clusters/%2A/apis/core.kcp.io/v1alpha1/logicalclusters?limit=500&resourceVersion=0": x509: certificate is valid for localhost, 10.129.28.8, not apiserver-loopback-client

#2407 passed so I need to figure out what changed and reproduce locally, currently I'm not clear why this previously worked given that AFAICS the controllers always use the loopback client...

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 24, 2023
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 24, 2023
Partially reverts logic introduced in kcp-dev#2407 so we don't unconditionally
use the external URL - when non-standalone VW mode is used we still
need the old behavior, or the loopback client can't access the VW
server.

This breaks shard bootstrapping (evidently not in our e2e scenarios,
but in a real deployment where probes mean the proxy isn't ready until
the shard bootstrap is complete)
@hardys
Copy link
Author

hardys commented Jan 24, 2023

Ok the failure in the previous revision was my mistake, I used ShardVirtualWorkspaceURL which is for shard->vw traffic, we still need to use ExternalAddress in the standalone VW case, so the initializingworkspaces requests hit the proxy.

This should now be ready to land as a first step (enables the integrated-VW case to work again when the shard is deployed behind a Service)

However @ncdc FYI I think we'll have to do further work to enable the standalone-vw case on real clusters - currently the shard bootstrap dependency on ExternalAddress presents a chicken/egg problem given that the proxy isn't available until the shard is ready.

@ncdc
Copy link
Member

ncdc commented Jan 24, 2023

@hardys let's chat tomorrow?

@hardys
Copy link
Author

hardys commented Jan 25, 2023

@hardys let's chat tomorrow?

Sure, but we could also consider landing this as an interim measure (to unblock the unstable test environment)

@hardys
Copy link
Author

hardys commented Jan 25, 2023

/hold cancel

This could land now to unblock things, but I'll follow-up with @ncdc to discuss the plan for standalone vw - we could also look at reworking sharded-test-server so e2e better replicates the serialized bootstrapping behavior implied by using services and probes in a "real" deployment.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 25, 2023
@hardys
Copy link
Author

hardys commented Jan 25, 2023

we could also look at reworking sharded-test-server so e2e better replicates the serialized bootstrapping behavior implied by using services and probes in a "real" deployment.

Turns out the proxy startup is already serialized (we wait until the shard is ready before starting the proxy), so the main difference is --external-hostname isn't pointing to the proxy in CI, pushed #2672 so we have a reproducer

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 25, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 25, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ncdc, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ncdc
Copy link
Member

ncdc commented Jan 25, 2023

Flake #2589
/retest

@openshift-merge-robot openshift-merge-robot merged commit ab2757a into kcp-dev:main Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants