-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator: Add Pod annotations with node topology labels to support zone aware scheduling #9503
Conversation
6ecdc1c
to
5e7e16b
Compare
f7d9dc9
to
062df3f
Compare
98f73e3
to
11e09b6
Compare
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
4e6a70b
to
d2541b6
Compare
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, looking really good! Just some minor comments
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
After doing some testing on this using an OpenShift cluster with worker nodes on three different AWS zones on the same AWS region, I noticed that our patching scheduled pods strategy does not always work as expected. All not so often but very likely to happen I see pods missing the annotations that the
For the sake of completeness, here is how my worker nodes look like after installing the cluster on AWS:
PTAL (cc @xperimental ) |
Signed-off-by: Shweta Padubidri <spadubid@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have prepared a branch containing all the changes I have commented on. Have a look.
The main changes are:
- Filter Pod reconciliations by label
- Move Pod changes to a single place (unfortunately needed to modify the existing code a bit, pulling out
podTemplate
) - Move constants for annotations and labels into the API package
Unfortunately it took a while to prepare this, I thought there was a timing issue in there when setting the annotation on the Pod, but I was not able to reproduce this today anymore.
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
operator/controllers/loki/lokistack_zone_labeling_controller.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just some unit test missing. Still have to re-run it on my side
Signed-off-by: Shweta Padubidri <spadubid@redhat.com>
operator/internal/handlers/lokistack_enable_zone_awareness_test.go
Outdated
Show resolved
Hide resolved
operator/internal/handlers/lokistack_enable_zone_awareness_test.go
Outdated
Show resolved
Hide resolved
operator/internal/handlers/lokistack_enable_zone_awareness_test.go
Outdated
Show resolved
Hide resolved
operator/internal/handlers/lokistack_enable_zone_awareness_test.go
Outdated
Show resolved
Hide resolved
operator/internal/handlers/lokistack_enable_zone_awareness_test.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Shweta Padubidri <spadubid@redhat.com>
Signed-off-by: Shweta Padubidri <spadubid@redhat.com>
Signed-off-by: Shweta Padubidri <spadubid@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a branch to capture a couple of improvements for the comments below.
- Using
mergo.Merge
forconfigureReplication
- Add test for rendering the Loki Config with
instance_availability_zone
- Add Changelog entry
for i := range podTemplate.Spec.Containers { | ||
podTemplate.Spec.Containers[i].Env = append(podTemplate.Spec.Containers[i].Env, availabilityZoneEnvVar) | ||
} | ||
|
||
podTemplate.Labels = labels.Merge(podTemplate.Labels, map[string]string{ | ||
lokiv1.LabelZoneAwarePod: "enabled", | ||
}) | ||
podTemplate.Annotations[lokiv1.AnnotationAvailabilityZoneLabels] = topologyKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to stick with mergo.Merge
here as we do with the other configure
patching functions.
/retest |
operator/internal/handlers/lokistack_enable_zone_awareness_test.go
Outdated
Show resolved
Hide resolved
operator/internal/handlers/lokistack_enable_zone_awareness_test.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Shweta Padubidri <spadubid@redhat.com>
What this PR does / why we need it:
This PR adds support to zone aware scheduling by adding zone specific node topology labels to loki component pods
Which issue(s) this PR fixes:
Fixes #LOG-3834
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updateddocs/sources/upgrading/_index.md