You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Pulsar brokers use Zookeeper for publishing the membership to other brokers.
The Pulsar load manager will detect a new broker and assign new namespace bundles to the new broker.
One of the gaps with Pulsar and k8s is that the broker registers itself to Zookeeper and the broker pod
cannot be accessed before the broker pod's readiness probe check has passed and k8s marks the pod ready
and makes it available in DNS. There's also additional delay from DNS caches. This delay will cause a short service disruption for all topics belonging to the namespace bundles that have been assigned to the new broker.
For the brokers, there's also other reasons: the headless service has "publishNotReadyAddresses: true" and other service (which would be used for lookups) doesn't, so that lookups go only to healthy brokers.
"If you are using Alpine version 3.17 or earlier as your base image, DNS may not work properly due to a design issue with Alpine. Until musl version 1.24 didn't include TCP fallback to the DNS stub resolver meaning any DNS call above 512 bytes would fail. Please upgrade your images to Alpine version 3.18 or above."
Since headless services return all hosts in the DNS response, it could exceed 512 bytes.
Describe the solution you'd like
The way to mitigate this is to have a separate ClusterIP service to be used for lookups and have a headless service with
"publishNotReadyAddresses: true". This is a common way to address the problem. One example of this is the
Bitnami Zookeeper Helm chart, https://github.com/bitnami/charts/tree/main/bitnami/zookeeper/templates
there's svc.yaml and svc-headless.yaml
The DNS names of the 2 services will be different. Since clients don't need to know the direct DNS names of the brokers, it's better to change the current headless service to be the ClusterIP service and add a new headless service for handling the stable names of the brokers. Admin scripts that connect directly to individual brokers might break, but that's could be made clear in the release notes when this feature is released.
The text was updated successfully, but these errors were encountered:
It's possible that making a change to use podManagementPolicy: OrderedReady for the broker statefulset could be a partial mitigation to this issue until this is resolved. #474 adds a solution where podManagementPolicy: OrderedReady is used when the function worker is enabled in the broker. it's not currently possible to configure to use this explicitly in the Helm chart without modifications.
Is your feature request related to a problem? Please describe.
Pulsar brokers use Zookeeper for publishing the membership to other brokers.
The Pulsar load manager will detect a new broker and assign new namespace bundles to the new broker.
One of the gaps with Pulsar and k8s is that the broker registers itself to Zookeeper and the broker pod
cannot be accessed before the broker pod's readiness probe check has passed and k8s marks the pod ready
and makes it available in DNS. There's also additional delay from DNS caches. This delay will cause a short service disruption for all topics belonging to the namespace bundles that have been assigned to the new broker.
For the brokers, there's also other reasons: the headless service has "publishNotReadyAddresses: true" and other service (which would be used for lookups) doesn't, so that lookups go only to healthy brokers.
Headless broker services with a lot of brokers could cause issues with Pulsar clients that are running on Alpine <3.18. k8s docs have something about this too: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues .
"If you are using Alpine version 3.17 or earlier as your base image, DNS may not work properly due to a design issue with Alpine. Until musl version 1.24 didn't include TCP fallback to the DNS stub resolver meaning any DNS call above 512 bytes would fail. Please upgrade your images to Alpine version 3.18 or above."
Since headless services return all hosts in the DNS response, it could exceed 512 bytes.
Describe the solution you'd like
The way to mitigate this is to have a separate ClusterIP service to be used for lookups and have a headless service with
"publishNotReadyAddresses: true". This is a common way to address the problem. One example of this is the
Bitnami Zookeeper Helm chart, https://github.com/bitnami/charts/tree/main/bitnami/zookeeper/templates
there's svc.yaml and svc-headless.yaml
"StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods."
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations
Since brokers use a statefulset, the headless service cannot be removed.
The DNS names of the 2 services will be different. Since clients don't need to know the direct DNS names of the brokers, it's better to change the current headless service to be the ClusterIP service and add a new headless service for handling the stable names of the brokers. Admin scripts that connect directly to individual brokers might break, but that's could be made clear in the release notes when this feature is released.
The text was updated successfully, but these errors were encountered: