Skip to content
This repository was archived by the owner on Jan 24, 2024. It is now read-only.

[FEATURE] KOP Proxy Design document #717

Open
eolivelli opened this issue Sep 8, 2021 · 11 comments
Open

[FEATURE] KOP Proxy Design document #717

eolivelli opened this issue Sep 8, 2021 · 11 comments
Labels
type/feature Indicates new functionality

Comments

@eolivelli
Copy link
Contributor

Proxy design guiding principles

  • The Proxy is stateless
  • The Proxy simulates a “big” single broker cluster
  • The Proxy is aware about Tenants/Namespaces/Topic Bundles and it can execute Pulsar Lookups
  • The Proxy does the Authentication with the Client
  • The Proxy forwards the Identity of the Client (Principal) to the Broker as currently the Pulsar proxy does

Authentication and connections to the Brokers
The Kafka Client connects only to the Proxy (possibly to several Proxies depending on k8s/DNS…..)
Authentication is performed only against the Proxy, that keeps track of the User (Principal) and it forwards such Identity information to the Pulsar (KOP) broker which opens the connection to the Broker. We are going to open one TCP connection per Kafka client connection per Broker. (see #711)

Encryption - TLS Support
The KOP Proxy allows configuring SSL and SASL_SSL listeners the same as for the KOP Broker.
The configuration for TLS is the same as for the Pulsar Proxy, this way you can use the same certificates.
We can implement support for using the same configuration entries as for the KOP Protocol Handler.

Topic Owner Lookup
When the Kafka client refers to a Topic the Request is forwarded to the Broker that actually is the Owner of the Topic, opening a new Connection if needed, this can be done using Pulsar Lookup API.
For metadata requests the response will be restricted to the list of topics that can be reached by the Authenticated user: this should be already handed in the KOP Protocol handler.

Message forwarding
The Kafka client is supposed to lookup the Leader broker for a partition and to issue Produce and Fetch requests directly to the Leader broker. The proxy can easily forward the request to the KOP broker that is the owner of the Topic/Partition.
See the protocol here:
https://kafka.apache.org/protocol.html

Please note that in the “Produce Request” the Broker ID/Address is not cited: the Kafka client is supposed to connect to the Broker that is the leader and then send the Produce PDU.
In this case in the Metadata Response we are always responding with the Proxy address as Leader Broker, so the client is always forced to connect to the Proxy (any instance of the Proxy that answers to the address reported in the Metadata Response for the partition).

In case of a "Produce Request" to multiple topics the Proxy must process each Topic (possibly grouping them by Leader broker) and then compose a final result.
For the case of one single topic it is enough to pass the Request to the leader broker.
See the next paragraph for a more detailed explanation.

Splitting the Requests on the Proxy

In the case of the two main APIs: Produce and Fetch (to consume messages) we have this flow (it is basically the same for Produce and Fetch, the example talks about Produce):
The client wants to send a few “records” to several partitions
The ProduceRequest API allows you to batch the requests:
ProduceRequest:

  • Topic1-Partition1 - Records1
  • Topic1- Partition2 - Records1
  • Topic2-Partition3 - Records1

The constraint from the Client Point of View is that the Broker must be the leader of every partition in the ProduceRequest
In the case of the KOP Proxy, the Client thinks that the KOP Proxy is the leader of every topic and every partition in the world, so it batches the writes as much as possible.

The Proxy has two cases:

  • All the partitions are for the same Pulsar Broker:
    -- We can forward the PDU to the Broker and proxy the request, transparently
  • There are Multiple Brokers to dispatch the PDU
    -- We have to split the PDU
    -- Send the smaller PDUs to each Broker, in parallel
    -- Wait for all the brokers
    -- Recompose the Response
    -- Send the Response

Requests for Coordinators
We have many request types that should be forwarded to a "Coordinator" (Group/Transaction).
In this case the proxy looks up the current coordinator for the given "group" and forwards the request to the KOP Broker accordingly.

@eolivelli eolivelli added the type/feature Indicates new functionality label Sep 8, 2021
@BewareMyPower
Copy link
Collaborator

Overall LGTM but a question. How does the proxy look up the current coordinator?

@BewareMyPower
Copy link
Collaborator

@sijie Could you also take a look?

@sijie
Copy link
Member

sijie commented Sep 8, 2021

My general reaction to the proposal is: We shouldn't try to create another proxy solution for Kafka protocol for a couple of reasons:

  1. It is very hard to make it correct and efficient because Kafka protocol isn’t designed to route to the right broker with additional metadata. Pulsar proxy is able to do that because we introduce an additional metadata field as part of its protocol. Newer Kafka protocol might be adding this capability. Because of lacking such capability, the “Kafka proxy” has to do a topic lookup for every produce and fetch request. Despite you can argue that a proxy can cache the ownership, it still means every proxy has to cache almost all the ownerships for all topics. It is not efficient if you have a lot of topics.

  2. The “Kafka proxy” has to return a “load balancer” DNS name to the Kafka clients. No matter what topics are, it always returns the same DNS name. If I understand the Kafka client implementation correctly, it maintains one TCP connection per “broker”. If that’s the case, it means Kafka client will only have one connection to the “Kafka proxy”, all the requests will be routed to the same proxy instance. That means a single Kafka proxy can potentially become a bottleneck.

  3. There are already a lot of mature solutions on exposing Kafka services outside of a Kubernetes cluster. Those solutions are used by a lot of Kafka users and proven to be production-ready. We should adopt those solutions instead of introducing another proxy solution - a) it is not battle-tested b) it is not widely used and adopted by the Kafka community. You can check out solutions like Strimzi Kafka operator or Banzai Cloud Kafka operator. They have developed a solid solution using well-adopted proxy software like Envoy.

@sijie
Copy link
Member

sijie commented Sep 8, 2021

KoP is designed for providing Kafka-compatible protocol for Kafka clients. We shouldn't reinvent a proxy solution ourselves. We should adopt the "proxy" solution what has been proven and used in the Kafka community.

@eolivelli
Copy link
Contributor Author

How does the proxy look up the current coordinator?

It follows the same rules as the "findCoordinator" implementation in the KOP PH. It looks up the owner for the topic that is mapped to the group (this is why I send my API refactor PR, in order to be able to use the same logic).

See here:
https://github.com/datastax/kop/blob/9f69026e507577de7e625b77885ac641e883440c/proxy/src/main/java/io/streamnative/pulsar/handlers/kop/KafkaProxyRequestHandler.java#L1061

@eolivelli
Copy link
Contributor Author

KoP is designed for providing Kafka-compatible protocol for Kafka clients. We shouldn't reinvent a proxy solution ourselves. We should adopt the "proxy" solution what has been proven and used in the Kafka community.

The fact that no one has done it before doesn't mean that we cannot introduce a new software.
It is an hard work, but most of the logic is still in the KOP PH, so we are not adding too much complexity or code.

If the user is fine with the trade-offs, then I believe that it is good to provide the users a good built-in Proxy for Kafka over Pulsar.

With a proxy like this you can use Kafka clients with Pulsar out-of-the-box, no need to add other third party components/services, only Pulsar + the KOP .nar files and a bunch of configuration entries in broker.conf and proxy.conf.

Currently my testing show that using KOP behind this kind of proxy works really well, and from the user (sys admin, sys architect...) perspective it is amazing as it fits well the Pulsar picture.

Probably not all the users will want to use this approach, that's fine for me.
But I have valid use-cases, and user requests to work in this direction.
Envoy proxy is not a good solution for some of the usecases.
Inside the Envoy project there are efforts to implement something like this proposal, but only using the Kafka protocol (kafka_mesh).

@sijie
Copy link
Member

sijie commented Sep 8, 2021

The fact that no one has done it before doesn't mean that we cannot introduce a new software

What KOP proxy is doing can already be achieved by using existing proxy software. Why do you want to implement this again? Especially you are implementing it using Java language which has its own deficiency comparing to other proxy software like Envoy.

With a proxy like this you can use Kafka clients with Pulsar out-of-the-box, no need to add other third party components/services, only Pulsar + the KOP .nar files and a bunch of configuration entries in broker.conf and proxy.conf.

The problem mostly exists in Kubernetes world. Envoy and Istio is the de-factor standard for proxies and request routing. In Kubernetes world, people use helm chart and operators to deploy Pulsar (and KoP). We have included this in our helm chart. People can easily install a Pulsar cluster with KOP enabled in Kubernetes using that helm chart.

Why do you think a KoP proxy can simplify this process?

If you are deploying KoP in an on-prem cluster, you don't need a KoP proxy at all. So I don't know what are the actual value brought in by KoP proxy comparing to the existing solution. Instead, I see it introduces additional complexity that we need to maintain.

Envoy proxy is not a good solution for some of the usecases.

Can you show why Envoy is not able to address what you are doing? Why KOP proxy is doing better than using Envoy?

@BewareMyPower
Copy link
Collaborator

@eolivelli @sijie

IMO, if the proxy module was not instructive to the kafka-impl module, we can accept this proposal and let community users compare it with the existing solutions.

@eolivelli
Copy link
Contributor Author

Why KOP proxy is doing better than using Envoy?

  1. From a architecture point of view, the KOP Proxy is like the Pulsar Proxy, you do not add anything more to the overall picture, and to the deployment of the cluster
  2. You do not need an additional software component, that is to be deployed/configured/managed using other tools/scripts, different from what it provided by Pulsar
  3. The KOP proxy can integrate natively with Pulsar authentication, authorisation and service discovery
  4. The KOP proxy can deal automatically with broker scaling, no need to coordinate with other third party components, like adding/removing Envoy listeners/configurations/services

As said, probably in some other usecases the user may be good with using something else, but those 4 points cannot be addressed with something that is not "part of Pulsar" (probably point 4 to some extent)

@eolivelli
Copy link
Contributor Author

eolivelli commented Sep 8, 2021

IMO, if the proxy module was not instructive to the kafka-impl module, we can accept this proposal and let community users compare it with the existing solutions.

Yes, my idea is to add a new "proxy" Maven module and add new tests that leverage existing tests ('extends') but launch a proxy and route all the Kafka traffic through the proxy instead of connecting directly to the KOP broker (not all tests are applicable)

@wangjialing218
Copy link
Contributor

If the purpose of proxy is to help client outside Kubernetes cluster to connect to brokers inside, I think we could use advertised listener, let client connect to broker directly and get the correct lookup result which the client could reconnect to.
Use Java based proxy cost addition CPU, memory and network resource. And there is no loadbalance for proxy, proxy may get unstable and become a bottleneck when traffic grows large.

I'm working on #669 , could that be a solution for you?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/feature Indicates new functionality
Projects
None yet
Development

No branches or pull requests

4 participants