Predict the GlanceAPI volumes order #509

fmount · 2024-04-15T22:49:18Z

When TLS is enabled at Pod level (which is the new default introduced by the openstack-operator), and a StatefulSet is created, a new revision is rolled out because of the overrides passed by the OpenStack operator to the service CR. In Glance this introduced an additional issue: in case of multiple APIs, an iteration is performed through the Spec instances, and the TLS override is checked out for each endpoint. No one ensures that the StatefulSet has the same order of the provided Volumes, and this might generate multiple (random) rollout(s) until the StatefulSet converges with two subsequent revisions that keep the same mount order. To avoid multiple restarts, this patch sorts the iteration through the endpoints, so we can always predict the mount order and avoid unnecessary restarts.

Related: OSPRH-6331

fmount · 2024-04-15T22:56:08Z

With this change I can make sure to only have two revisions: the first one w/o TLS, and the second is the CR patched:

[stack@osp-storage-04 glance_edge]$ oc rollout history statefulset glance-default-single -o yaml | grep -i kind
kind: StatefulSet
    kind: GlanceAPI
    kind: PersistentVolumeClaim
kind: StatefulSet
    kind: GlanceAPI
    kind: PersistentVolumeClaim

The previous situation required three or more sts revisions to converge [1].

[1] https://paste.opendev.org/show/bxuglcmkwl0usMbhRXkg/

stuggi · 2024-04-16T06:23:54Z

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.

In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

fmount · 2024-04-16T06:54:14Z

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.

In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

It shouldn't hurt, indeed, and we can consider to have this kind of utility as a lib-common function so we can hide this implementation detail. I didn't analyze the other operators so far, but this might be a problem related to a couple of things:

we do not use kolla to copy the certificates, but we reference and mount them to the right location, while in other operators kolla takes care about this part
we iterate over the API and we use the GetEndpoints function that has a logic that does not ensure the endpoints are returned in the same order (because we have split, single, edge, and a combination of internal/external endpoints: this means that internal.crt and public.crt mountpoint are sometimes switched across different StatefulSet rollouts, and the same might be applied to keys.

Sorting volumes and volumeMounts can definitely solve this problem by leaving the flexibility of implementation behind the scenes (according to the needs), but ensuring we don't have additional updateRevisions in the StatefulSet due to an ordering problem.
To be clear, I understand that I have at least a restart (I'm not sure we can avoid it somehow, by providing the right overrides before we start rolling out services at the openstack-operator service, and wait for the input data before triggering the reconciliation of the underlying services), but in my tests I've seen an insane number of restarts (3 or 4) before reaching the Running state. I expect to see 2 revisions and 1 restart as "expected" behavior.

fmount · 2024-04-16T07:34:51Z

@stuggi this patch in combination with #510 should at least align the glance-operator w/ the expected behavior, and we can follow up to improve the codebase.

stuggi · 2024-04-16T09:37:43Z

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.
In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

It shouldn't hurt, indeed, and we can consider to have this kind of utility as a lib-common function so we can hide this implementation detail. I didn't analyze the other operators so far, but this might be a problem related to a couple of things:

we do not use kolla to copy the certificates, but we reference and mount them to the right location, while in other operators kolla takes care about this part

don't think its an issue which way to be used. there is also a mount for when kolla is used. it could also flip if it flips

we iterate over the API and we use the GetEndpoints function that has a logic that does not ensure the endpoints are returned in the same order (because we have split, single, edge, and a combination of internal/external endpoints: this means that internal.crt and public.crt mountpoint are sometimes switched across different StatefulSet rollouts, and the same might be applied to keys.

you are referring to GetGlanceEndpoints here https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159 , right? So if I get it right, this is the real issue. GetGlanceEndpoints returns a map which for sure is not sorted. An alternative fix would be (probably faster) to not loop over GetGlanceEndpoints() at https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159C21-L159C39 and instead sort the endpoints, like

	endpts := maps.Keys(GetGlanceEndpoints(instance.Spec.APIType))
	sort.Slice(endpts, func(i, j int) bool {
		return string(endpts[i]) < string(endpts[j])
	})

	for _, endpt := range endpts {

right now I don't think we have this issue in other operators since we don't have there the concept of multiple apis as it is in glance.

Sorting volumes and volumeMounts can definitely solve this problem by leaving the flexibility of implementation behind the scenes (according to the needs), but ensuring we don't have additional updateRevisions in the StatefulSet due to an ordering problem.

To be clear, I understand that I have at least a restart (I'm not sure we can avoid it somehow, by providing the right overrides before we start rolling out services at the openstack-operator service, and wait for the input data before triggering the reconciliation of the underlying services), but in my tests I've seen an insane number of restarts (3 or 4) before reaching the Running state. I expect to see 2 revisions and 1 restart as "expected" behavior.

We could only solve this if we agree that the openstack-operator has knowledge how the service operators create the services so that we can pre-create the certificates without having the created services. This is what I had initially but there were concerns that the openstack-operator has/needs internal knowledge on how the service operators create their k8s services.

fmount · 2024-04-16T10:00:29Z

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.
In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

It shouldn't hurt, indeed, and we can consider to have this kind of utility as a lib-common function so we can hide this implementation detail. I didn't analyze the other operators so far, but this might be a problem related to a couple of things:

we do not use kolla to copy the certificates, but we reference and mount them to the right location, while in other operators kolla takes care about this part

don't think its an issue which way to be used. there is also a mount for when kolla is used. it could also flip if it flips

we iterate over the API and we use the GetEndpoints function that has a logic that does not ensure the endpoints are returned in the same order (because we have split, single, edge, and a combination of internal/external endpoints: this means that internal.crt and public.crt mountpoint are sometimes switched across different StatefulSet rollouts, and the same might be applied to keys.

you are referring to GetGlanceEndpoints here https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159 , right? So if I get it right, this is the real issue. GetGlanceEndpoints returns a map which for sure is not sorted. An alternative fix would be (probably faster) to not loop over GetGlanceEndpoints() at https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159C21-L159C39 and instead sort the endpoints, like
	endpts := maps.Keys(GetGlanceEndpoints(instance.Spec.APIType))
	sort.Slice(endpts, func(i, j int) bool {
		return string(endpts[i]) < string(endpts[j])
	})

	for _, endpt := range endpts {
right now I don't think we have this issue in other operators since we don't have there the concept of multiple apis as it is in glance.

Sorting volumes and volumeMounts can definitely solve this problem by leaving the flexibility of implementation behind the scenes (according to the needs), but ensuring we don't have additional updateRevisions in the StatefulSet due to an ordering problem.

To be clear, I understand that I have at least a restart (I'm not sure we can avoid it somehow, by providing the right overrides before we start rolling out services at the openstack-operator service, and wait for the input data before triggering the reconciliation of the underlying services), but in my tests I've seen an insane number of restarts (3 or 4) before reaching the Running state. I expect to see 2 revisions and 1 restart as "expected" behavior.

We could only solve this if we agree that the openstack-operator has knowledge how the service operators create the services so that we can pre-create the certificates without having the created services. This is what I had initially but there were concerns that the openstack-operator has/needs internal knowledge on how the service operators create their k8s services.

We can discuss about this in a different context, but it might be predictable what services the operators are going to create. I understand the concern, so in case TLS is enabled it might be the logic within the service operators to create the service and "properly wait" until the TLS input are ready. if we're able to catch this information at bootstrap time, we could probably avoid the rollout because we can requeue the reconciliation loop until we have the data we need.

fmount · 2024-04-16T10:45:49Z

I am not sure I understand the problem correct. we have an array of volumes and volumemounts and the way volumes/mounts get added has a specific order. so parts of the array change because of conditions (expected), but the ordering should not change since we always append. As said, it is expected that the deployment restarts when the certificate got created and passed into via the overrides, but when the list is complete it should not change and right now I do not get why ordering the volumes would change the behavior.
In general it won't hurt to sort, but then we might want to add the funcs to lib-common as we may want to do it across all operators?

It shouldn't hurt, indeed, and we can consider to have this kind of utility as a lib-common function so we can hide this implementation detail. I didn't analyze the other operators so far, but this might be a problem related to a couple of things:

we do not use kolla to copy the certificates, but we reference and mount them to the right location, while in other operators kolla takes care about this part

don't think its an issue which way to be used. there is also a mount for when kolla is used. it could also flip if it flips

we iterate over the API and we use the GetEndpoints function that has a logic that does not ensure the endpoints are returned in the same order (because we have split, single, edge, and a combination of internal/external endpoints: this means that internal.crt and public.crt mountpoint are sometimes switched across different StatefulSet rollouts, and the same might be applied to keys.

you are referring to GetGlanceEndpoints here https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159 , right? So if I get it right, this is the real issue. GetGlanceEndpoints returns a map which for sure is not sorted. An alternative fix would be (probably faster) to not loop over GetGlanceEndpoints() at https://github.com/openstack-k8s-operators/glance-operator/blob/main/pkg/glanceapi/statefulset.go#L159C21-L159C39 and instead sort the endpoints, like
	endpts := maps.Keys(GetGlanceEndpoints(instance.Spec.APIType))
	sort.Slice(endpts, func(i, j int) bool {
		return string(endpts[i]) < string(endpts[j])
	})

	for _, endpt := range endpts {

The above logic is very simple, and from a pure coding point of view it's the fastest approach.
However my concern comes from golang/go#61538 and I'd like to not introduce golang.org/x/exp/ dependency, but if you think it might be better be consistent with that approach I can do that. Another idea is to simply go through internal endpoints first, public later, and refactor that function, but in my original approach I just wanted to minimize the effort/impact over that piece of code.

right now I don't think we have this issue in other operators since we don't have there the concept of multiple apis as it is in glance.

Sorting volumes and volumeMounts can definitely solve this problem by leaving the flexibility of implementation behind the scenes (according to the needs), but ensuring we don't have additional updateRevisions in the StatefulSet due to an ordering problem. If you think that sorting the endpoints is better I can write something that collect the keys and reorder that part, then I can iterated over the sorted keys and get the endpoint:

To be clear, I understand that I have at least a restart (I'm not sure we can avoid it somehow, by providing the right overrides before we start rolling out services at the openstack-operator service, and wait for the input data before triggering the reconciliation of the underlying services), but in my tests I've seen an insane number of restarts (3 or 4) before reaching the Running state. I expect to see 2 revisions and 1 restart as "expected" behavior.

We could only solve this if we agree that the openstack-operator has knowledge how the service operators create the services so that we can pre-create the certificates without having the created services. This is what I had initially but there were concerns that the openstack-operator has/needs internal knowledge on how the service operators create their k8s services.

fmount · 2024-04-16T11:16:56Z

@stuggi FYI I switched to the approach you proposed. I tested it locally (w/ podLevel=true) and I think I'm ok to go with it.
I might need to update kuttl, but we can merge after we get a green CI.

When TLS is enabled at Pod level (which is the new default introduced by the openstack-operator), and a statefulset is created, a new revision is rolled out because of the overrides passed by the OpenStack operator to the service CR. In glance this introduced an additional issue: in case of multiple APIs, an iteration is performed through the Spec instances, and the TLS override is checked out for each endpoint. No one ensures that the StatefulSet has the same order of the provided mountpoints, and this might generate multiple (random) rollouts until it converges with two subsequent revisions that keep the same order. To avoid multiple restarts, this patch sorts the iteration on the resulting endpoints associated with the GlanceAPI StatefulSet. By doing this we can always predict the mount order of Volumes and VolumeMounts and avoid unnecessary restarts. Signed-off-by: Francesco Pantano <fpantano@redhat.com>

fmount · 2024-04-16T13:55:54Z

@stuggi seems good to go

stuggi

/lgtm thanks!

openshift-ci · 2024-04-16T13:58:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fmount, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [fmount,stuggi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fmount requested review from stuggi and konan-abhi April 15, 2024 22:49

openshift-ci bot requested review from lewisdenny and viroel and removed request for stuggi and konan-abhi April 15, 2024 22:49

openshift-ci bot added the approved label Apr 15, 2024

fmount force-pushed the sts_revisions branch from dd1dcfe to 6a454d0 Compare April 15, 2024 22:51

fmount force-pushed the sts_revisions branch from 6a454d0 to a2cc7c6 Compare April 16, 2024 05:44

fmount requested a review from stuggi April 16, 2024 06:54

fmount force-pushed the sts_revisions branch from a2cc7c6 to b185d25 Compare April 16, 2024 11:14

fmount force-pushed the sts_revisions branch from b185d25 to 8facc60 Compare April 16, 2024 12:37

stuggi approved these changes Apr 16, 2024

View reviewed changes

openshift-ci bot assigned stuggi Apr 16, 2024

openshift-ci bot added the lgtm label Apr 16, 2024

openshift-merge-bot bot merged commit 676f29f into openstack-k8s-operators:main Apr 16, 2024
7 checks passed

lewisdenny removed their request for review April 17, 2024 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predict the GlanceAPI volumes order #509

Predict the GlanceAPI volumes order #509

fmount commented Apr 15, 2024 •

edited

Loading

fmount commented Apr 15, 2024

stuggi commented Apr 16, 2024

fmount commented Apr 16, 2024

fmount commented Apr 16, 2024 •

edited

Loading

stuggi commented Apr 16, 2024 •

edited

Loading

fmount commented Apr 16, 2024

fmount commented Apr 16, 2024

fmount commented Apr 16, 2024

fmount commented Apr 16, 2024

stuggi left a comment

openshift-ci bot commented Apr 16, 2024

Predict the GlanceAPI volumes order #509

Predict the GlanceAPI volumes order #509

Conversation

fmount commented Apr 15, 2024 • edited Loading

fmount commented Apr 15, 2024

stuggi commented Apr 16, 2024

fmount commented Apr 16, 2024

fmount commented Apr 16, 2024 • edited Loading

stuggi commented Apr 16, 2024 • edited Loading

fmount commented Apr 16, 2024

fmount commented Apr 16, 2024

fmount commented Apr 16, 2024

fmount commented Apr 16, 2024

stuggi left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Apr 16, 2024

fmount commented Apr 15, 2024 •

edited

Loading

fmount commented Apr 16, 2024 •

edited

Loading

stuggi commented Apr 16, 2024 •

edited

Loading