Create 2 services for brokers: a headless service with publishNotReadyAddresses and an ordinary clusterip service for lookups #437

lhotari · 2024-01-19T09:52:56Z

Is your feature request related to a problem? Please describe.

Pulsar brokers use Zookeeper for publishing the membership to other brokers.
The Pulsar load manager will detect a new broker and assign new namespace bundles to the new broker.
One of the gaps with Pulsar and k8s is that the broker registers itself to Zookeeper and the broker pod
cannot be accessed before the broker pod's readiness probe check has passed and k8s marks the pod ready
and makes it available in DNS. There's also additional delay from DNS caches. This delay will cause a short service disruption for all topics belonging to the namespace bundles that have been assigned to the new broker.

For the brokers, there's also other reasons: the headless service has "publishNotReadyAddresses: true" and other service (which would be used for lookups) doesn't, so that lookups go only to healthy brokers.

Headless broker services with a lot of brokers could cause issues with Pulsar clients that are running on Alpine <3.18. k8s docs have something about this too: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues .

"If you are using Alpine version 3.17 or earlier as your base image, DNS may not work properly due to a design issue with Alpine. Until musl version 1.24 didn't include TCP fallback to the DNS stub resolver meaning any DNS call above 512 bytes would fail. Please upgrade your images to Alpine version 3.18 or above."

Since headless services return all hosts in the DNS response, it could exceed 512 bytes.

Describe the solution you'd like

The way to mitigate this is to have a separate ClusterIP service to be used for lookups and have a headless service with
"publishNotReadyAddresses: true". This is a common way to address the problem. One example of this is the
Bitnami Zookeeper Helm chart, https://github.com/bitnami/charts/tree/main/bitnami/zookeeper/templates
there's svc.yaml and svc-headless.yaml

"StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods."
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations
Since brokers use a statefulset, the headless service cannot be removed.

The DNS names of the 2 services will be different. Since clients don't need to know the direct DNS names of the brokers, it's better to change the current headless service to be the ClusterIP service and add a new headless service for handling the stable names of the brokers. Admin scripts that connect directly to individual brokers might break, but that's could be made clear in the release notes when this feature is released.

lhotari · 2024-09-03T06:26:02Z

It's possible that making a change to use podManagementPolicy: OrderedReady for the broker statefulset could be a partial mitigation to this issue until this is resolved. #474 adds a solution where podManagementPolicy: OrderedReady is used when the function worker is enabled in the broker. it's not currently possible to configure to use this explicitly in the Helm chart without modifications.

lhotari · 2024-09-03T10:22:34Z

the podManagementPolicy workaround is implemented in #525

lhotari mentioned this issue Jan 19, 2024

how to direct connect to broker without proxy #423

Open

lhotari mentioned this issue Mar 21, 2024

[Bug] Broker startup could get into a long crash loop when Functions are enabled #473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create 2 services for brokers: a headless service with publishNotReadyAddresses and an ordinary clusterip service for lookups #437

Create 2 services for brokers: a headless service with publishNotReadyAddresses and an ordinary clusterip service for lookups #437

lhotari commented Jan 19, 2024 •

edited

Loading

lhotari commented Sep 3, 2024

lhotari commented Sep 3, 2024

Create 2 services for brokers: a headless service with publishNotReadyAddresses and an ordinary clusterip service for lookups #437

Create 2 services for brokers: a headless service with publishNotReadyAddresses and an ordinary clusterip service for lookups #437

Comments

lhotari commented Jan 19, 2024 • edited Loading

lhotari commented Sep 3, 2024

lhotari commented Sep 3, 2024

lhotari commented Jan 19, 2024 •

edited

Loading