Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict the GlanceAPI volumes order #509

Merged

Conversation

fmount
Copy link
Contributor

@fmount fmount commented Apr 15, 2024

When TLS is enabled at Pod level (which is the new default introduced by the openstack-operator), and a StatefulSet is created, a new revision is rolled out because of the overrides passed by the OpenStack operator to the service CR. In Glance this introduced an additional issue: in case of multiple APIs, an iteration is performed through the Spec instances, and the TLS override is checked out for each endpoint. No one ensures that the StatefulSet has the same order of the provided Volumes, and this might generate multiple (random) rollout(s) until the StatefulSet converges with two subsequent revisions that keep the same mount order. To avoid multiple restarts, this patch sorts the iteration through the endpoints, so we can always predict the mount order and avoid unnecessary restarts.

Related: OSPRH-6331

@fmount fmount requested review from stuggi and konan-abhi April 15, 2024 22:49
@openshift-ci openshift-ci bot requested review from lewisdenny and viroel and removed request for stuggi and konan-abhi April 15, 2024 22:49
@fmount
Copy link
Contributor Author

fmount commented Apr 15, 2024

With this change I can make sure to only have two revisions: the first one w/o TLS, and the second is the CR patched:

[stack@osp-storage-04 glance_edge]$ oc rollout history statefulset glance-default-single -o yaml | grep -i kind
kind: StatefulSet
    kind: GlanceAPI
    kind: PersistentVolumeClaim
kind: StatefulSet
    kind: GlanceAPI
    kind: PersistentVolumeClaim

The previous situation required three or more sts revisions to converge [1].

[1] https://paste.opendev.org/show/bxuglcmkwl0usMbhRXkg/

@stuggi
Copy link
Contributor

stuggi commented Apr 16, 2024

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.

In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

@fmount
Copy link
Contributor Author

fmount commented Apr 16, 2024

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.

In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

It shouldn't hurt, indeed, and we can consider to have this kind of utility as a lib-common function so we can hide this implementation detail. I didn't analyze the other operators so far, but this might be a problem related to a couple of things:

  1. we do not use kolla to copy the certificates, but we reference and mount them to the right location, while in other operators kolla takes care about this part
  2. we iterate over the API and we use the GetEndpoints function that has a logic that does not ensure the endpoints are returned in the same order (because we have split, single, edge, and a combination of internal/external endpoints: this means that internal.crt and public.crt mountpoint are sometimes switched across different StatefulSet rollouts, and the same might be applied to keys.

Sorting volumes and volumeMounts can definitely solve this problem by leaving the flexibility of implementation behind the scenes (according to the needs), but ensuring we don't have additional updateRevisions in the StatefulSet due to an ordering problem.
To be clear, I understand that I have at least a restart (I'm not sure we can avoid it somehow, by providing the right overrides before we start rolling out services at the openstack-operator service, and wait for the input data before triggering the reconciliation of the underlying services), but in my tests I've seen an insane number of restarts (3 or 4) before reaching the Running state. I expect to see 2 revisions and 1 restart as "expected" behavior.

@fmount fmount requested a review from stuggi April 16, 2024 06:54
@fmount
Copy link
Contributor Author

fmount commented Apr 16, 2024

@stuggi this patch in combination with #510 should at least align the glance-operator w/ the expected behavior, and we can follow up to improve the codebase.

@stuggi
Copy link
Contributor

stuggi commented Apr 16, 2024

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.
In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

It shouldn't hurt, indeed, and we can consider to have this kind of utility as a lib-common function so we can hide this implementation detail. I didn't analyze the other operators so far, but this might be a problem related to a couple of things:

  1. we do not use kolla to copy the certificates, but we reference and mount them to the right location, while in other operators kolla takes care about this part

don't think its an issue which way to be used. there is also a mount for when kolla is used. it could also flip if it flips

  1. we iterate over the API and we use the GetEndpoints function that has a logic that does not ensure the endpoints are returned in the same order (because we have split, single, edge, and a combination of internal/external endpoints: this means that internal.crt and public.crt mountpoint are sometimes switched across different StatefulSet rollouts, and the same might be applied to keys.

you are referring to GetGlanceEndpoints here https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159 , right? So if I get it right, this is the real issue. GetGlanceEndpoints returns a map which for sure is not sorted. An alternative fix would be (probably faster) to not loop over GetGlanceEndpoints() at https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159C21-L159C39 and instead sort the endpoints, like

	endpts := maps.Keys(GetGlanceEndpoints(instance.Spec.APIType))
	sort.Slice(endpts, func(i, j int) bool {
		return string(endpts[i]) < string(endpts[j])
	})

	for _, endpt := range endpts {

right now I don't think we have this issue in other operators since we don't have there the concept of multiple apis as it is in glance.

Sorting volumes and volumeMounts can definitely solve this problem by leaving the flexibility of implementation behind the scenes (according to the needs), but ensuring we don't have additional updateRevisions in the StatefulSet due to an ordering problem.

To be clear, I understand that I have at least a restart (I'm not sure we can avoid it somehow, by providing the right overrides before we start rolling out services at the openstack-operator service, and wait for the input data before triggering the reconciliation of the underlying services), but in my tests I've seen an insane number of restarts (3 or 4) before reaching the Running state. I expect to see 2 revisions and 1 restart as "expected" behavior.

We could only solve this if we agree that the openstack-operator has knowledge how the service operators create the services so that we can pre-create the certificates without having the created services. This is what I had initially but there were concerns that the openstack-operator has/needs internal knowledge on how the service operators create their k8s services.

@fmount
Copy link
Contributor Author

fmount commented Apr 16, 2024

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.
In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

It shouldn't hurt, indeed, and we can consider to have this kind of utility as a lib-common function so we can hide this implementation detail. I didn't analyze the other operators so far, but this might be a problem related to a couple of things:

  1. we do not use kolla to copy the certificates, but we reference and mount them to the right location, while in other operators kolla takes care about this part

don't think its an issue which way to be used. there is also a mount for when kolla is used. it could also flip if it flips

  1. we iterate over the API and we use the GetEndpoints function that has a logic that does not ensure the endpoints are returned in the same order (because we have split, single, edge, and a combination of internal/external endpoints: this means that internal.crt and public.crt mountpoint are sometimes switched across different StatefulSet rollouts, and the same might be applied to keys.

you are referring to GetGlanceEndpoints here https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159 , right? So if I get it right, this is the real issue. GetGlanceEndpoints returns a map which for sure is not sorted. An alternative fix would be (probably faster) to not loop over GetGlanceEndpoints() at https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159C21-L159C39 and instead sort the endpoints, like

	endpts := maps.Keys(GetGlanceEndpoints(instance.Spec.APIType))
	sort.Slice(endpts, func(i, j int) bool {
		return string(endpts[i]) < string(endpts[j])
	})

	for _, endpt := range endpts {

right now I don't think we have this issue in other operators since we don't have there the concept of multiple apis as it is in glance.

Sorting volumes and volumeMounts can definitely solve this problem by leaving the flexibility of implementation behind the scenes (according to the needs), but ensuring we don't have additional updateRevisions in the StatefulSet due to an ordering problem.

To be clear, I understand that I have at least a restart (I'm not sure we can avoid it somehow, by providing the right overrides before we start rolling out services at the openstack-operator service, and wait for the input data before triggering the reconciliation of the underlying services), but in my tests I've seen an insane number of restarts (3 or 4) before reaching the Running state. I expect to see 2 revisions and 1 restart as "expected" behavior.

We could only solve this if we agree that the openstack-operator has knowledge how the service operators create the services so that we can pre-create the certificates without having the created services. This is what I had initially but there were concerns that the openstack-operator has/needs internal knowledge on how the service operators create their k8s services.

We can discuss about this in a different context, but it might be predictable what services the operators are going to create. I understand the concern, so in case TLS is enabled it might be the logic within the service operators to create the service and "properly wait" until the TLS input are ready. if we're able to catch this information at bootstrap time, we could probably avoid the rollout because we can requeue the reconciliation loop until we have the data we need.

@fmount
Copy link
Contributor Author

fmount commented Apr 16, 2024

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.
In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

It shouldn't hurt, indeed, and we can consider to have this kind of utility as a lib-common function so we can hide this implementation detail. I didn't analyze the other operators so far, but this might be a problem related to a couple of things:

  1. we do not use kolla to copy the certificates, but we reference and mount them to the right location, while in other operators kolla takes care about this part

don't think its an issue which way to be used. there is also a mount for when kolla is used. it could also flip if it flips

  1. we iterate over the API and we use the GetEndpoints function that has a logic that does not ensure the endpoints are returned in the same order (because we have split, single, edge, and a combination of internal/external endpoints: this means that internal.crt and public.crt mountpoint are sometimes switched across different StatefulSet rollouts, and the same might be applied to keys.

you are referring to GetGlanceEndpoints here https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159 , right? So if I get it right, this is the real issue. GetGlanceEndpoints returns a map which for sure is not sorted. An alternative fix would be (probably faster) to not loop over GetGlanceEndpoints() at https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159C21-L159C39 and instead sort the endpoints, like

	endpts := maps.Keys(GetGlanceEndpoints(instance.Spec.APIType))
	sort.Slice(endpts, func(i, j int) bool {
		return string(endpts[i]) < string(endpts[j])
	})

	for _, endpt := range endpts {

The above logic is very simple, and from a pure coding point of view it's the fastest approach.
However my concern comes from golang/go#61538 and I'd like to not introduce golang.org/x/exp/ dependency, but if you think it might be better be consistent with that approach I can do that. Another idea is to simply go through internal endpoints first, public later, and refactor that function, but in my original approach I just wanted to minimize the effort/impact over that piece of code.

right now I don't think we have this issue in other operators since we don't have there the concept of multiple apis as it is in glance.

Sorting volumes and volumeMounts can definitely solve this problem by leaving the flexibility of implementation behind the scenes (according to the needs), but ensuring we don't have additional updateRevisions in the StatefulSet due to an ordering problem. If you think that sorting the endpoints is better I can write something that collect the keys and reorder that part, then I can iterated over the sorted keys and get the endpoint:

To be clear, I understand that I have at least a restart (I'm not sure we can avoid it somehow, by providing the right overrides before we start rolling out services at the openstack-operator service, and wait for the input data before triggering the reconciliation of the underlying services), but in my tests I've seen an insane number of restarts (3 or 4) before reaching the Running state. I expect to see 2 revisions and 1 restart as "expected" behavior.

We could only solve this if we agree that the openstack-operator has knowledge how the service operators create the services so that we can pre-create the certificates without having the created services. This is what I had initially but there were concerns that the openstack-operator has/needs internal knowledge on how the service operators create their k8s services.

@fmount
Copy link
Contributor Author

fmount commented Apr 16, 2024

@stuggi FYI I switched to the approach you proposed. I tested it locally (w/ podLevel=true) and I think I'm ok to go with it.
I might need to update kuttl, but we can merge after we get a green CI.

When TLS is enabled at Pod level (which is the new default introduced by
the openstack-operator), and a statefulset is created, a new revision is
rolled out because of the overrides passed by the OpenStack operator to
the service CR. In glance this introduced an additional issue: in case
of multiple APIs, an iteration is performed through the Spec instances,
and the TLS override is checked out for each endpoint. No one ensures
that the StatefulSet has the same order of the provided mountpoints, and
this might generate multiple (random) rollouts until it converges with
two subsequent revisions that keep the same order. To avoid multiple
restarts, this patch sorts the iteration on the resulting endpoints
associated with the GlanceAPI StatefulSet. By doing this we can always
predict the mount order of Volumes and VolumeMounts and avoid unnecessary
restarts.

Signed-off-by: Francesco Pantano <fpantano@redhat.com>
@fmount
Copy link
Contributor Author

fmount commented Apr 16, 2024

@stuggi seems good to go

Copy link
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm thanks!

Copy link
Contributor

openshift-ci bot commented Apr 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fmount, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 676f29f into openstack-k8s-operators:main Apr 16, 2024
7 checks passed
@lewisdenny lewisdenny removed their request for review April 17, 2024 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants