Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topology spread constraints on zones and anti-affinity for receivers and dispatchers #2092

Conversation

pierDipi
Copy link
Member

To spread receivers and dispatchers across zones for HA.

Signed-off-by: Pierangelo Di Pilato pierdipi@redhat.com

Part of #1537 and Broker HA

…tchers

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
@knative-prow knative-prow bot added area/data-plane size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 13, 2022
@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2022
@codecov
Copy link

codecov bot commented Apr 13, 2022

Codecov Report

Merging #2092 (ce1732c) into main (5e5ade7) will decrease coverage by 16.09%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##               main    #2092       +/-   ##
=============================================
- Coverage     82.69%   66.60%   -16.10%     
  Complexity      676      676               
=============================================
  Files            72      141       +69     
  Lines          2289     9033     +6744     
  Branches        195      195               
=============================================
+ Hits           1893     6016     +4123     
- Misses          291     2623     +2332     
- Partials        105      394      +289     
Flag Coverage Δ
java-unittests 82.65% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...a/broker/dispatcher/impl/RecordDispatcherImpl.java 90.44% <0.00%> (-0.57%) ⬇️
control-plane/pkg/reconciler/sink/kafka_sink.go 73.58% <0.00%> (ø)
...ntrol-plane/pkg/apis/eventing/v1alpha1/register.go 100.00% <0.00%> (ø)
...ol-plane/pkg/security/secrets_provider_net_spec.go 85.41% <0.00%> (ø)
control-plane/pkg/contract/serde.go 100.00% <0.00%> (ø)
...ls/kafka/eventing/v1alpha1/consumer_group_types.go 27.27% <0.00%> (ø)
control-plane/pkg/security/config.go 73.91% <0.00%> (ø)
control-plane/pkg/security/scram.go 70.58% <0.00%> (ø)
control-plane/pkg/security/secret.go 97.67% <0.00%> (ø)
...lane/pkg/reconciler/base/receiver_condition_set.go 0.00% <0.00%> (ø)
... and 60 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e5ade7...ce1732c. Read the comment docs.

@pierDipi pierDipi mentioned this pull request Apr 13, 2022
Copy link
Contributor

@aavarghese aavarghese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

One more file needs similar changes...sourcev2

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
@pierDipi
Copy link
Member Author

Done

Comment on lines +36 to +52
# To avoid node becoming SPOF, spread our replicas to different nodes and zones.
topologySpreadConstraints:
- maxSkew: 2
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: kafka-broker-dispatcher
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app: kafka-broker-dispatcher
topologyKey: kubernetes.io/hostname
weight: 100
Copy link
Contributor

@aavarghese aavarghese Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this combination is functionally better than having two pod topology spread constraints - one for zone and one for node? (Was trying to read about this...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't have any specific anti-affinity rules for nodes, we could instead have multiple pod topology spread constraints?

     # To avoid node becoming SPOF, spread our replicas to different nodes and zones.
      topologySpreadConstraints:
        - maxSkew: 2
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: kafka-broker-dispatcher
        - maxSkew: 2
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: kafka-broker-dispatcher

This should also satisfy the existing anti-affinity rule for preferredDuringSchedulingIgnoredDuringExecution.

On the other hand, our existing dispatchers already had podAntiAffinity for nodes, so if we want to make minimal changes, that is understandable
cc: @pierDipi was this your thinking too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, afaik, topologySpreadConstraints + maxSkew = 1 + ScheduleAnyway should be equivalent to our existing antiAffinity rule, I'm ok to migrate antiAffinity rules to use
topologySpreadConstraints + maxSkew = 1 + ScheduleAnyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i think making this equivalent change separately another time/another PR is fine too.

Copy link
Contributor

@aavarghese aavarghese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2022
@knative-prow
Copy link

knative-prow bot commented Apr 14, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aavarghese, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot merged commit 25b975d into knative-extensions:main Apr 14, 2022
aavarghese pushed a commit to aavarghese/eventing-kafka-broker that referenced this pull request Apr 26, 2022
…and dispatchers (knative-extensions#2092)

* Topology spread constraints and anti-affinity for receivers and dispatchers

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Include sourcev2 dispatcher

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
steven0711dong pushed a commit to steven0711dong/eventing-kafka-broker that referenced this pull request Apr 29, 2022
…and dispatchers (knative-extensions#2092)

* Topology spread constraints and anti-affinity for receivers and dispatchers

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Include sourcev2 dispatcher

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/data-plane lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants