MGMT-19120: Use service net to connect to hosted API server #7090

jhernand · 2024-12-13T20:17:35Z

There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the kube-apiserver.my-cluster.svc host name, as the API server runs as a pod in the same cluster. Doing that reduces the number of round trips and the potential proxy configuration issues. In order to achieve that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL with https://kube-apiserver.my-cluster.svc:6443.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19120

What environments does this code impact?

Automation (CI, tools, etc)
Cloud
Operator Managed Deployments
None

How was this code tested?

assisted-test-infra environment
dev-scripts environment
Reviewer's test appreciated
Waiting for CI to do a full test run
Manual (Elaborate on how it was tested)
No tests needed

Checklist

Title and description added to both, commit and PR.
Relevant issues have been associated (see CONTRIBUTING guide)
This change does not require a documentation update (docstring, docs, README, etc)
Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?
Is there a bug required (and linked) for this change?
Should this PR be backported?

openshift-ci-robot · 2024-12-13T20:18:02Z

@jhernand: This pull request references MGMT-19120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the kube-apiserver.my-cluster.svc host name, as the API server runs as a pod in the same cluster. During that reduces the number of round trips and the potential proxy configuration issues. In order to achieve that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL
with https://kube-apiserver.my-cluster.svc:6443.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19120

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-12-13T20:18:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhernand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jhernand]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gamli75 · 2024-12-16T18:51:45Z

@eranco74 can you review this PR?

openshift-ci-robot · 2024-12-16T19:10:16Z

@jhernand: This pull request references MGMT-19120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the kube-apiserver.my-cluster.svc host name, as the API server runs as a pod in the same cluster. Doing that reduces the number of round trips and the potential proxy configuration issues. In order to achieve that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL
with https://kube-apiserver.my-cluster.svc:6443.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19120

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2024-12-16T19:11:31Z

@jhernand: This pull request references MGMT-19120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the kube-apiserver.my-cluster.svc host name, as the API server runs as a pod in the same cluster. Doing that reduces the number of round trips and the potential proxy configuration issues. In order to achieve that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL with https://kube-apiserver.my-cluster.svc:6443.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19120

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

codecov · 2024-12-18T16:33:03Z

Codecov Report

Attention: Patch coverage is 82.35294% with 21 lines in your changes missing coverage. Please review.

Project coverage is 67.63%. Comparing base (8607a87) to head (4d0cd06).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
internal/spoke_k8s_client/factory.go	82.24%	15 Missing and 4 partials ⚠️
...nal/controller/controllers/bmh_agent_controller.go	80.00%	1 Missing ⚠️
internal/spoke_k8s_client/spoke_k8s_client.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7090      +/-   ##
==========================================
+ Coverage   67.52%   67.63%   +0.10%     
==========================================
  Files         296      296              
  Lines       40088    40158      +70     
==========================================
+ Hits        27071    27160      +89     
+ Misses      10574    10548      -26     
- Partials     2443     2450       +7

Files with missing lines	Coverage Δ
...nternal/controller/controllers/agent_controller.go	`76.43% <100.00%> (ø)`
...oller/controllers/clusterdeployments_controller.go	`72.79% <100.00%> (ø)`
...rollers/hypershiftagentserviceconfig_controller.go	`76.58% <100.00%> (ø)`
...ernal/controller/controllers/spoke_client_cache.go	`85.18% <100.00%> (ø)`
...nal/controller/controllers/bmh_agent_controller.go	`77.11% <80.00%> (ø)`
internal/spoke_k8s_client/spoke_k8s_client.go	`35.29% <0.00%> (+35.29%)`	⬆️
internal/spoke_k8s_client/factory.go	`80.00% <82.24%> (+35.55%)`	⬆️

... and 2 files with indirect coverage changes

jhernand · 2024-12-18T20:09:19Z

/retest-required

tsorya · 2024-12-19T11:05:00Z

internal/spoke_k8s_client/factory.go

+}
+
+// SetHubClient sets the client that will be used to call the API of the hub cluster. This is mandatory.
+func (b *SpokeK8sClientFactoryBuilder) SetHubClient(value ctrlclient.Client) *SpokeK8sClientFactoryBuilder {


why we can't provide client while creating the object?
we have it here https://github.com/openshift/assisted-service/pull/7090/files#diff-891ba2cfffd82a8ae4131c88beb092d2a88149f579c931e3a1f7ca77fbfc82a5L166 no?

sorry my fault missed
https://github.com/openshift/assisted-service/pull/7090/files#diff-c444f711e9191b53952edb65bfd8c644419fc7695c62611dc0fb304b4fb197d6R625

Though it seems like this is a must parameter and we will get error in build if it was not set, so why not to provide it as param to New? Same actually for logger

This is just a way to make the code cleaner, avoiding long lists of parameters. We could pass the the logger, the client (and the transport wrapper, only used currently for tests) as parameters to the "New..." function, but over time that results in long lists of parameters like this.

api = NewManager(common.GetTestLog(), db, testing.GetDummyNotificationStream(ctrl), mockEventApi, nil, nil, nil, nil, &config, &leader.DummyElector{}, nil, nil, true, nil, nil, false)

It is already useful to avoid setting the transport wrapper parameter to nil.

But this is required params and in this case you left them as optional so i don't understand actually why it is good

I believe it is good for several reasons:

It is consistent: all the parameters (required or optional) are provided in the same way.

It makes it clearer what each parameter means. Not in this case, but if you had two parameters that are strings it is not the same to see this:

whatever, err := NewWhatever("foo", "bar")

Than this:

whatever, err := NewWhatever((). SetUserName("foo"). SetPassword("bar"). Build()

In the first case you have to deep digger to find out what is the meaning of the parameters, and in the second it is explicit.

It gives room for documenting each parameter separately: the documentation goes in the "Set..." method of the builder.

It simplifies building the object in multiple steps, if needed, for example:

builder := NewWhatever() builder.SetUserName("foo") if shouldUsePassword { builder.SetPassword("bar") } whatever, err := builder.Build()

It simplifies adding multiple values for the same parameter:

whatever, err := NewWahtever(). SetUserName("foo"). SetUserName("foo-alias"). Build()

It allows adding new optional parameters without having to change the call sites.

I don't want to bore you with my opinions about this. If you find this unacceptable I will change it.

Idk, maybe it just me, I just believe that if parameter is required it should be provided as part of function call another way if someone will write
whatever, err := NewWahtever().Build()
it will pass compilation but will fail on the run an i think better to find such error in compilation.
Though it is my personal opinion

Just to be sure, i like your proposition i just don't think it should be that way with required params

I understand your point of view, and still think that the benefits outweigh the drawbacks. As that is not the key point of this pull request I am changing it to a plain list of parameters. We can have this discussion another time.

tsorya · 2024-12-19T11:08:49Z

internal/spoke_k8s_client/factory.go

+	// object reference. So to find the cluster deployment we can get all the instances inside the namespace of the
+	// secret and then select the first one that references it.
+	clusterDeploymentList := &hivev1.ClusterDeploymentList{}
+	err = f.hubClient.List(ctx, clusterDeploymentList, ctrlclient.InNamespace(kubeconfigSecret.Namespace))


can't we list with filter?

Not sure what you mean, can you elaborate? Note that the search criteria here is spec.clusterMetadata.adminKubeconfigSecretRef.Name == ..., I think searching by that field isn't supported by the API.

I believe it can be only one cluster deployment per namespace actually. Don't we have owner ref in the secret for clusterdeployment?

I'd say we don't need to rely on that here.

tsorya · 2024-12-19T11:15:13Z

internal/spoke_k8s_client/factory.go

+	return
+}
+
+func (f *spokeK8sClientFactory) CreateFromSecret(ctx context.Context, secret *corev1.Secret) (result SpokeK8sClient, err error) {


I wonder why better to return result this way?
(result SpokeK8sClient, err error) ?
I believe 99% of the code doesn't do it this way and i just wonder why it is better

It is probably a matter of taste. I like to have the names of the return parameters: helps understand what to expect. Not very important in this case as the meaning is very clear. I can change it if you want.

I agree that it is taste issue :) just most of the code have another style so why to have different styles?

Fair enough, I will change it.

tsorya · 2024-12-19T11:16:44Z

internal/spoke_k8s_client/factory.go

-	if err != nil {
-		cf.log.WithError(err).Warnf("Getting kuberenetes config for cluster")
-		return nil, nil, err
+func (f *spokeK8sClientFactory) kubeConfigFromSecret(secret *corev1.Secret) (result []byte, err error) {


Can we make it common function? We have at least 2 more place that do the same

Well, at least we have one less place now: I removed similar logic from the spoke client cache in a previous patch. I will try to find where we are doing this.

I will do this in a different patch.

tsorya · 2024-12-19T11:18:25Z

internal/spoke_k8s_client/factory.go

+	// Try to find the cluster deployment. If we can't, for whatever the reason, explain it in the log and assume
+	// it isn't a hosted cluster.
+	clusterDeployment, err := f.findClusterDeploymentForKubeconfigSecret(ctx, kubeconfigSecret)
+	if err != nil || clusterDeployment == nil {


Error and not having clusterDeployment seems to be different issues, maybe we should split the logging at least?

OK, will do.

carbonin · 2024-12-19T13:18:03Z

internal/spoke_k8s_client/factory.go

-		log:         cf.log,
+// findClusterDeploymentForKubeconfigSecret finds the cluster deployment that corresponds to the given kubeconfig
+// secret. It returns nil if there is no such cluster deployment.
+func (f *spokeK8sClientFactory) findClusterDeploymentForKubeconfigSecret(ctx context.Context,


Are we ever in a situation where the caller of this factory doesn't already have a reference to the cluster deployment?

Since (based on the naming) we're talking about "spoke" clusters it seems likely that this could be simplified by either the caller supplying the cluster deployment or by this logic living outside this factory (then we would have an option like "useHubServiceNetwork" or something when creating the client).

Yes, here we don't know what is the cluster deployment:

assisted-service/internal/controller/controllers/hypershiftagentserviceconfig_controller.go

Line 342 in 9a1b9ec

spokeClient, err := hr.SpokeClients.Get(kubeconfigSecret)

.

Okay, thanks.

Side note though ... can we delete the HASC CRD and controller yet?
@gamli75 that effort isn't happening now, right?

I'm not familiar with that effort. maybe @CrystalChun

I'm not familiar with it either, maybe @danielerez?

There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the `kube-apiserver.my-cluster.svc` host name, as the API server runs as a pod in the same cluster. Doing that reduces the number of round trips and the potential proxy configuration issues. In order to achive that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL with `https://kube-apiserver.my-cluster.svc:6443`. Related: https://issues.redhat.com/browse/MGMT-19120 Signed-off-by: Juan Hernandez <juan.hernandez@redhat.com>

openshift-ci · 2024-12-19T23:24:01Z

@jhernand: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/edge-e2e-ai-operator-disconnected-capi	`4d0cd06`	link	false	`/test edge-e2e-ai-operator-disconnected-capi`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jhernand changed the title ~~Use service network to talk to hosted api server~~ MGMT-19120: Use service net to connect to hosted API server Dec 13, 2024

openshift-ci-robot added the jira/valid-reference label Dec 13, 2024

jhernand marked this pull request as draft December 13, 2024 20:18

openshift-ci bot added size/XL do-not-merge/work-in-progress labels Dec 13, 2024

openshift-ci bot requested review from CrystalChun and linoyaslan December 13, 2024 20:18

openshift-ci bot added the approved label Dec 13, 2024

jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from 63c3673 to 390c90d Compare December 16, 2024 16:06

openshift-ci bot added size/XXL and removed size/XL labels Dec 16, 2024

jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from 390c90d to 3e7178a Compare December 16, 2024 19:39

jhernand mentioned this pull request Dec 17, 2024

NO-ISSUE: Allow use of 'envtest' #7100

Merged

20 tasks

jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch 2 times, most recently from 4cd5768 to 92b8bac Compare December 18, 2024 11:58

openshift-ci bot added size/XL and removed size/XXL labels Dec 18, 2024

jhernand marked this pull request as ready for review December 18, 2024 15:54

openshift-ci bot removed the do-not-merge/work-in-progress label Dec 18, 2024

openshift-ci bot requested review from eliorerz and giladravid16 December 18, 2024 15:59

jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from 92b8bac to 7b9e4dc Compare December 19, 2024 08:27

tsorya reviewed Dec 19, 2024

View reviewed changes

jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from 7b9e4dc to cb06b53 Compare December 19, 2024 12:55

carbonin reviewed Dec 19, 2024

View reviewed changes

jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from cb06b53 to 4d0cd06 Compare December 19, 2024 19:38

MGMT-19120: Use service net to connect to hosted API server #7090

Are you sure you want to change the base?

MGMT-19120: Use service net to connect to hosted API server #7090

Conversation

jhernand commented Dec 13, 2024 • edited Loading

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

openshift-ci-robot commented Dec 13, 2024 • edited by openshift-ci bot Loading

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

openshift-ci bot commented Dec 13, 2024

gamli75 commented Dec 16, 2024

openshift-ci-robot commented Dec 16, 2024 • edited by openshift-ci bot Loading

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

openshift-ci-robot commented Dec 16, 2024 • edited by openshift-ci bot Loading

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

codecov bot commented Dec 18, 2024 • edited Loading

Codecov Report

jhernand commented Dec 18, 2024

Choose a reason for hiding this comment

tsorya Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

jhernand Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carbonin Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Dec 19, 2024

jhernand commented Dec 13, 2024 •

edited

Loading

openshift-ci-robot commented Dec 13, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 16, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 16, 2024 •

edited by openshift-ci bot

Loading

codecov bot commented Dec 18, 2024 •

edited

Loading

tsorya Dec 19, 2024 •

edited

Loading

jhernand Dec 19, 2024 •

edited

Loading

carbonin Dec 19, 2024 •

edited

Loading